Hi,
I'm trying to run GLnexus(v1.4.1) on a large dataset with 100,000 small gvcfs((1Mbp-2Mbp of chr1), the small gvcfs was simuted from 2504 gvcfs of the 1000 Genomes Project by randomly selecting gvcfs with replication and rename gvcf sample names.
I run GLnexus on a server with 184G memory, 72 threads, nvme SSD , CentOS7
here is my commond to run GLnexus
/path/to/glnexus_cli \
--dir result-glnexus/GLnexus.DB \
--bed target.bed --list vcfs.list \
--threads 10 --mem-gbytes 60 | bcftools view --threads 12 -O z -o result-glnexus/result.vcf.gz
and here is tail of log file
[23109] [2022-08-02 00:36:23.650] [GLnexus] [info] Bulk load complete!
[23109] [2022-08-02 00:36:24.345] [GLnexus] [info] found sample set @100000
[23109] [2022-08-02 00:36:24.346] [GLnexus] [info] discovering alleles in 1 range(s) on 8 threads
[23109] [2022-08-02 01:10:23.176] [GLnexus] [info] discovered 167420 alleles
[23109] [2022-08-02 01:12:22.491] [GLnexus] [info] unified to 63450 sites cleanly with 71734 ALT alleles. 1937 ALT alleles were additionally included in monoallelic sites and 12215 were filtered out on quality thresholds.
[23109] [2022-08-02 01:12:22.491] [GLnexus] [info] Finishing database compaction...
[23109] [2022-08-02 01:12:24.363] [GLnexus] [info] genotyping 63450 sites; sample set = @100000 mem_budget = 64424509440 threads = 10
ClockTime:4:24:38 ClockTimeSeconds:15878.10 CPU_Time:127822.03 CPU_percent:875% ResidentMemory:180420124
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
Error: BCF read error
I have jemalloc installed and GLnexus is linked to jemalloc dynamic library(NOT by using LD_PRELOAD), here is the head lines of log file
[23109] [2022-08-01 21:07:43.920] [GLnexus] [info] glnexus_cli release v1.4.1-0-g68e25e5-dirty Aug 1 2022
[23109] [2022-08-01 21:07:43.923] [GLnexus] [info] detected jemalloc 5.3.0-0-g54eaed1d8b56b1aa528be3bdd1877e59c56fa90c
Any suggestions on how I should deal with this issue?
Thanks!
Archie
Hi, I'm trying to run GLnexus(v1.4.1) on a large dataset with 100,000 small gvcfs((1Mbp-2Mbp of chr1), the small gvcfs was simuted from 2504 gvcfs of the 1000 Genomes Project by randomly selecting gvcfs with replication and rename gvcf sample names. I run GLnexus on a server with 184G memory, 72 threads, nvme SSD , CentOS7
here is my commond to run GLnexus /path/to/glnexus_cli \ --dir result-glnexus/GLnexus.DB \ --bed target.bed --list vcfs.list \ --threads 10 --mem-gbytes 60 | bcftools view --threads 12 -O z -o result-glnexus/result.vcf.gz
and here is tail of log file [23109] [2022-08-02 00:36:23.650] [GLnexus] [info] Bulk load complete! [23109] [2022-08-02 00:36:24.345] [GLnexus] [info] found sample set @100000 [23109] [2022-08-02 00:36:24.346] [GLnexus] [info] discovering alleles in 1 range(s) on 8 threads [23109] [2022-08-02 01:10:23.176] [GLnexus] [info] discovered 167420 alleles [23109] [2022-08-02 01:12:22.491] [GLnexus] [info] unified to 63450 sites cleanly with 71734 ALT alleles. 1937 ALT alleles were additionally included in monoallelic sites and 12215 were filtered out on quality thresholds. [23109] [2022-08-02 01:12:22.491] [GLnexus] [info] Finishing database compaction... [23109] [2022-08-02 01:12:24.363] [GLnexus] [info] genotyping 63450 sites; sample set = @100000 mem_budget = 64424509440 threads = 10 ClockTime:4:24:38 ClockTimeSeconds:15878.10 CPU_Time:127822.03 CPU_percent:875% ResidentMemory:180420124 [W::bgzf_read_block] EOF marker is absent. The input is probably truncated Error: BCF read error
I have jemalloc installed and GLnexus is linked to jemalloc dynamic library(NOT by using LD_PRELOAD), here is the head lines of log file [23109] [2022-08-01 21:07:43.920] [GLnexus] [info] glnexus_cli release v1.4.1-0-g68e25e5-dirty Aug 1 2022 [23109] [2022-08-01 21:07:43.923] [GLnexus] [info] detected jemalloc 5.3.0-0-g54eaed1d8b56b1aa528be3bdd1877e59c56fa90c
Any suggestions on how I should deal with this issue? Thanks! Archie