Closed yunhailuo closed 5 years ago
Can you try the "megarow" build of BEDOPS? Please see step 4 here: https://bedops.readthedocs.io/en/latest/content/installation.html#via-source-code
The "megarow" build should allow a longer line. Or you can adjust constants as described in the error message, but the custom build may help you directly.
@alexpreynolds Thank you for your quick reply. Unfortunately, I got the same error (though I'm not sure if it's from the same row). Do you have any other suggestions? Is there a way to figure out which row is problematic?
I'm away until Tuesday but will try out the FTP link as soon as I'm back. I'm sure we can figure this out.
Please take your time. Thank you in advance.
I tested the megarow
build of convert2bed
(which vcf2bed
calls) and it was able to convert the FTPed VCF file without errors:
$ git clone https://github.com/bedops/bedops.git
$ cd bedops
$ make megarow
...
$ ./applications/bed/conversion/bin/convert2bed-megarow --input=vcf --do-not-split --do-not-sort < ../GCF_000001405.25.vcf > ../GCF_000001405.25.bed 2> ../GCF_000001405.25.bed.log
$ ls -al ../GCF_000001405.25.bed*
-rw-r--r-- 1 areynolds stamlab 100657233343 Nov 12 15:43 GCF_000001405.25.bed
-rw-r--r-- 1 areynolds stamlab 0 Nov 12 15:21 ../GCF_000001405.25.bed.log
Perhaps you may be still running so-called typical
binaries, which have been compiled with line limitations that would impact conversion of this specific VCF file.
Can you please list the steps you took to build megarow
binaries? Or can you please describe how you are installing BEDOPS, as well as what platform/kernel you are using? Thanks for your patience.
Thank you so much for trying it out, @alexpreynolds
I had, based on my history
:
$ git clone https://github.com/bedops/bedops.git
$ cd bedops/
$ make all
$ make install_all
$ export PATH="/home/ubuntu/dbsnp/bedops/bin:$PATH"
$ bin/vcf2bed-megarow --do-not-split --do-not-sort < /home/ubuntu/dbsnp/GCF_000001405.25 > GCF_000001405.25.bed
I tried these on Ubuntu:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.6 LTS
Release: 14.04
Codename: trusty
I'll purge everything, try your steps and let you know.
It worked with the following:
$ git clone https://github.com/bedops/bedops.git
...
$ cd bedops/
$ make megarow
...
$ make install_megarow
...
$ export PATH="/home/ubuntu/dbsnp/bedops/bin:$PATH"
$ bin/convert2bed-megarow --input=vcf --do-not-split --do-not-sort < ../GCF_000001405.25 > ../GCF_000001405.25.bed 2> ../GCF_000001405.25.bed.log
$ cat ../GCF_000001405.25.bed.log
-bash: bin/convert2bed-megarow: No such file or directory
$ bin/convert2bed --input=vcf --do-not-split --do-not-sort < ../GCF_000001405.25 > ../GCF_000001405.25.bed 2> ../GCF_000001405.25.bed.log
Thank you very much for all the help!
Hello - I'm experiencing a similar issue:
./vcf2bed-megarow --keep-header < input.vcf
Error: Could not find newline in intermediate buffer; check input [39704 | 1405 | 41109]
Please check that your input contains Unix newlines (cat -A) or increase TOKENS_MAX_LENGTH in BEDOPS.Constants.hpp and recompile BEDOPS.
vcf2bed-megarow
was installed viamake all
and I can't reinstall with make megarow
, because I'm unable to install the required static libraries in the computing cluster I'm using. Is there an alternative solution?
You could perhaps try the precompiled binaries in the Releases page:
https://github.com/bedops/bedops/releases
If you're on Linux, you can use the instructions here to extract and put items in a useful directory:
https://bedops.readthedocs.io/en/latest/content/installation.html#linux
Once installed, you should then be able to use the switch-BEDOPS-binary-type
helper script to switch between typical
and megarow
(and float128
, though probably not useful here):
$ switch-BEDOPS-binary-type --help
Switch the BEDOPS binary build to typical, megarow, or float128
Usage: switch-BEDOPS-binary-type [ --help ] [ --typical | --megarow | --float128 ] [ <binary-directory> (optional) ]
That worked, thank you!
/usr/local/bin/vcf2bed-megarow --input=vcf --do-not-split --do-not-sort --max-mem 30G <$(zcat 5000.genotype.vcffilter.vcf.gz.gz) >test.output.sort.bed -bash: xrealloc: cannot allocate 18446744071562067968 bytes (1331200 bytes allocated)
Has anyone encountered this problem?
Looking at this:
zcat 5000.genotype.vcffilter.vcf.gz.gz
There are two gz
extensions. Is it possible this is (for whatever reason) doubly-compressed and needs a second decompression, e.g.:
... <(gunzip -c 5000.genotype.vcffilter.vcf.gz.gz | gunzip -c) ...
?
I tried to convert dbSNP vcf to bed for use: ftp://ftp.ncbi.nlm.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.25.gz
I
gunzip
it and run with the following command:bin/vcf2bed --do-not-split --do-not-sort < GCF_000001405.25 > GCF_000001405.25.bed &
After running normal (output looks fine) for a while, I got:
I tried to check lines as mentioned in #208 with
awk '{print length($0);}' GCF_000001405.25 | sort -nr | head -3
and got:It doesn't seem like there is a line go beyond 5MB. Free memory to start is about 30GB and I used default 2G
--max-mem
.Any suggestions on what I'm missing, @alexpreynolds ?