dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
145 stars 37 forks source link

Truncated bcf file #178

Closed chrisfleisch closed 4 years ago

chrisfleisch commented 5 years ago

I've been using GLNexus 1.1.3 and about 27,000 WES gVCFs from DeepVariant to create a single cohort bcf. It's been working well for months. I decided to try GLNexus 1.1.10 and when writing the bcf file it seems to consume all the memory on the server (500GB or 750GB depending on the server I use) and the process is killed. The bcf file is not completed and I have to start over. I tried upgrading to GLNexus 1.1.11 and I have the same problem. So I've gone back to using the older version (1.1.3).

Is there something that changed in the new version that I should be aware of? Or is there some more info that I could provide that would help diagnose the problem.

mlin commented 4 years ago

@chrisfleisch I'm sorry for the very slow reply to this issue; I was tied up with moving and lost this in the shuffle.

The glnexus_cli executable has a --mem-gbytes X option to set the memory budget; can you try setting that lower, to something like 100 and seeing how that behaves? In general, GLnexus benefits from a healthy amount of RAM, but only up to a certain point depending on the number of threads, since it also uses RocksDB for external sorting and then to page the working set in from disk. We've deployed it on servers with up to 32 hardware threads and 240G of memory, have not tested it much beyond that so it''s possible something weird happens at that point.

chrisfleisch commented 4 years ago

Yes, using --mem-gbytes 100 seemed to fix the issue. I was able to used GLNexus 1.1.11 to write a complete bcf file this time. Thank you.