Closed sigven closed 8 years ago
How many variants in the query? How many processes are you using via -p
? I thought this was resolved, but I will try to recreate.
102k approx in query. Not specifying the number of processes, should I do that?
Baiscally running ./vcfanno conf.toml query.vcf.gz > query.annotated.vcf
ok. you can specify number processes with -p but I doubt that will help. I will have a look.
actually, specifying -p > 1 does prevent the error. With only 1 thread, a bunch of work is accumulating--that work involves having tabix queries ready to start iterating but the worker can't keep up and so too many file handles accumulate (along with too much work). For now, even using -p 2 seems to resolve the problem (and you'll get a nice speedup). I'll figure out what to do with -p 1 for the next release.
OK, thanks. Works.
I still experience it (p > 1), for whatever reason. I also found another error message trying a test annotation with a huge VCF file:
runtime: memory allocated by OS (0xf69ca000) not in usable range [0x18a00000,0x98a00000) runtime: memory allocated by OS (0xf6a9a000) not in usable range [0x18a00000,0x98a00000) runtime: out of memory: cannot allocate 196608-byte block (1879048192 in use) fatal error: out of memory
Are there any specific requirements in terms of size for query VCF?
Aye, the memory and open files problems are related. it's the same problem I described above. I have seen this for p > 1 as well and I am working on a solution. There should be no limits on the query VCF size
Hi @sigven I've fixed this and will commit shortly after I run a few more stress tests. You could try the binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_09b for 64 bit linux.
If you have any troubles, then please use
export IRELATE_VERBOSE=TRUE
before running vcfanno and then paste the output (though I'm confident this is resolved). the memory use will be a function of the number of processes (-p) requested.
Great! Works excellent on a test case now.
Another thing: could you elaborate on the difference between using GOMAXPROCS and your ā-pā argument?
On 24 Nov 2015, at 19:35, Brent Pedersen - Bioinformatics notifications@github.com wrote:
Hi @sigven https://github.com/sigven I've fixed this and will commit shortly after I run a few more stress tests. You could try the binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_09b http://home.chpc.utah.edu/%7Eu6000771/vcfanno_09b for 64 bit linux. If you have any troubles, then please use export IRELATE_VERBOSE=TRUE
before running vcfanno and then paste the output (though I'm confident this is resolved). the memory use will be a function of the number of processes (-p) requested.
ā Reply to this email directly or view it on GitHub https://github.com/brentp/vcfanno/issues/9#issuecomment-159366393.
Sigve Nakken, PhD Postdoctoral Fellow, Dept. of Tumor Biology Institute for Cancer Research Oslo University Hospital, Norway phone: +4795753022 email: sigven@ifi.uio.no
Great. the -p argument sets the same internal parameters as GOMAXPROCS would, but vcfanno ignores/overrides GOMAXPROCS.
Hi @brentp ,
Having downloaded the latest binaries, the too many open files problem has returned. It works fine with the Linux binary you created for me (http://home.chpc.utah.edu/~u6000771/vcfanno_09b), but not with the latest release.
best, Sigve
Hi,
I just ran a test annotation with 7 annotation VCF against a query VCF file. I received the following error (at chromosome 16):
open cosmic/cosmic.vcf.gz: too many open files
I am curious as to why the program ends with this message.