dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

out of memory in genotyping stage #277

Open Archieyoung opened 1 year ago

Archieyoung commented 1 year ago

Hi, I'm trying to run GLnexus(v1.4.1) on a large dataset with 100,000 small gvcfs((1Mbp-2Mbp of chr1), the small gvcfs was simuted from 2504 gvcfs of the 1000 Genomes Project by randomly selecting gvcfs with replication and rename gvcf sample names. I run GLnexus on a server with 184G memory, 72 threads, nvme SSD , CentOS7

here is my commond to run GLnexus /path/to/glnexus_cli \ --dir result-glnexus/GLnexus.DB \ --bed target.bed --list vcfs.list \ --threads 10 --mem-gbytes 60 | bcftools view --threads 12 -O z -o result-glnexus/result.vcf.gz

and here is tail of log file [23109] [2022-08-02 00:36:23.650] [GLnexus] [info] Bulk load complete! [23109] [2022-08-02 00:36:24.345] [GLnexus] [info] found sample set @100000 [23109] [2022-08-02 00:36:24.346] [GLnexus] [info] discovering alleles in 1 range(s) on 8 threads [23109] [2022-08-02 01:10:23.176] [GLnexus] [info] discovered 167420 alleles [23109] [2022-08-02 01:12:22.491] [GLnexus] [info] unified to 63450 sites cleanly with 71734 ALT alleles. 1937 ALT alleles were additionally included in monoallelic sites and 12215 were filtered out on quality thresholds. [23109] [2022-08-02 01:12:22.491] [GLnexus] [info] Finishing database compaction... [23109] [2022-08-02 01:12:24.363] [GLnexus] [info] genotyping 63450 sites; sample set = @100000 mem_budget = 64424509440 threads = 10 ClockTime:4:24:38 ClockTimeSeconds:15878.10 CPU_Time:127822.03 CPU_percent:875% ResidentMemory:180420124 [W::bgzf_read_block] EOF marker is absent. The input is probably truncated Error: BCF read error

I have jemalloc installed and GLnexus is linked to jemalloc dynamic library(NOT by using LD_PRELOAD), here is the head lines of log file [23109] [2022-08-01 21:07:43.920] [GLnexus] [info] glnexus_cli release v1.4.1-0-g68e25e5-dirty Aug 1 2022 [23109] [2022-08-01 21:07:43.923] [GLnexus] [info] detected jemalloc 5.3.0-0-g54eaed1d8b56b1aa528be3bdd1877e59c56fa90c

Any suggestions on how I should deal with this issue? Thanks! Archie