brentp / vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5
MIT License
365 stars 56 forks source link

too many open files #9

Closed sigven closed 8 years ago

sigven commented 9 years ago

Hi,

I just ran a test annotation with 7 annotation VCF against a query VCF file. I received the following error (at chromosome 16):

open cosmic/cosmic.vcf.gz: too many open files

I am curious as to why the program ends with this message.

brentp commented 9 years ago

How many variants in the query? How many processes are you using via -p? I thought this was resolved, but I will try to recreate.

sigven commented 9 years ago

102k approx in query. Not specifying the number of processes, should I do that?

Baiscally running ./vcfanno conf.toml query.vcf.gz > query.annotated.vcf

brentp commented 9 years ago

ok. you can specify number processes with -p but I doubt that will help. I will have a look.

brentp commented 9 years ago

actually, specifying -p > 1 does prevent the error. With only 1 thread, a bunch of work is accumulating--that work involves having tabix queries ready to start iterating but the worker can't keep up and so too many file handles accumulate (along with too much work). For now, even using -p 2 seems to resolve the problem (and you'll get a nice speedup). I'll figure out what to do with -p 1 for the next release.

sigven commented 9 years ago

OK, thanks. Works.

sigven commented 8 years ago

I still experience it (p > 1), for whatever reason. I also found another error message trying a test annotation with a huge VCF file:

runtime: memory allocated by OS (0xf69ca000) not in usable range [0x18a00000,0x98a00000) runtime: memory allocated by OS (0xf6a9a000) not in usable range [0x18a00000,0x98a00000) runtime: out of memory: cannot allocate 196608-byte block (1879048192 in use) fatal error: out of memory

Are there any specific requirements in terms of size for query VCF?

brentp commented 8 years ago

Aye, the memory and open files problems are related. it's the same problem I described above. I have seen this for p > 1 as well and I am working on a solution. There should be no limits on the query VCF size

brentp commented 8 years ago

Hi @sigven I've fixed this and will commit shortly after I run a few more stress tests. You could try the binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_09b for 64 bit linux. If you have any troubles, then please use export IRELATE_VERBOSE=TRUE

before running vcfanno and then paste the output (though I'm confident this is resolved). the memory use will be a function of the number of processes (-p) requested.

sigven commented 8 years ago

Great! Works excellent on a test case now.

sigven commented 8 years ago

Another thing: could you elaborate on the difference between using GOMAXPROCS and your ā€˜-pā€™ argument?

On 24 Nov 2015, at 19:35, Brent Pedersen - Bioinformatics notifications@github.com wrote:

Hi @sigven https://github.com/sigven I've fixed this and will commit shortly after I run a few more stress tests. You could try the binary here: http://home.chpc.utah.edu/~u6000771/vcfanno_09b http://home.chpc.utah.edu/%7Eu6000771/vcfanno_09b for 64 bit linux. If you have any troubles, then please use export IRELATE_VERBOSE=TRUE

before running vcfanno and then paste the output (though I'm confident this is resolved). the memory use will be a function of the number of processes (-p) requested.

ā€” Reply to this email directly or view it on GitHub https://github.com/brentp/vcfanno/issues/9#issuecomment-159366393.


Sigve Nakken, PhD Postdoctoral Fellow, Dept. of Tumor Biology Institute for Cancer Research Oslo University Hospital, Norway phone: +4795753022 email: sigven@ifi.uio.no

brentp commented 8 years ago

Great. the -p argument sets the same internal parameters as GOMAXPROCS would, but vcfanno ignores/overrides GOMAXPROCS.

sigven commented 8 years ago

Hi @brentp ,

Having downloaded the latest binaries, the too many open files problem has returned. It works fine with the Linux binary you created for me (http://home.chpc.utah.edu/~u6000771/vcfanno_09b), but not with the latest release.

best, Sigve