dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

Inquiry Regarding IOError Encountered During Multiple Rounds of Merging gVCF Files Using GLnexus #313

Open zhanxiangzong opened 1 week ago

zhanxiangzong commented 1 week ago

Hi!

Thank you for providing this tool.

I encountered a minor issue while using GLnexus and would like to seek your guidance.

Initially, I used GLnexus to merge five datasets, each consisting of several hundred gVCFs, and this process was completed without any problems. However, after converting the five merged BCFs back to gVCFs and attempting to merge these five files again using GLnexus, I encountered a failure during the "discovering alleles" step.

Here are the details of the error message:

[GLnexus] [info] glnexus_cli release v1.4.1-0-g68e25e5 Aug 13 2021 [GLnexus] [info] detected jemalloc [GLnexus] [info] Loading config preset gatk [GLnexus] [info] config: ... [GLnexus] [info] config CRC32C = 1926883223 [GLnexus] [info] init database, exemplar_vcf=./all_gvcf/500.gvcf.gz [GLnexus] [info] Initialized GLnexus database in GLnexus.DB [GLnexus] [info] bucket size: 30000 [GLnexus] [info] contigs: ... [GLnexus] [info] db_get_contigs GLnexus.DB [GLnexus] [info] Beginning bulk load with no range filter. [GLnexus] [info] Loaded 5 datasets with 2776 samples; 105459735784 bytes in 8064671 BCF records (173 duplicate) in 6449 buckets. Bucket max 105752856 bytes, 5309 records. 0 BCF records skipped due to caller-specific exceptions [GLnexus] [info] Created sample set @5 [GLnexus] [info] Flushing database... [GLnexus] [info] Bulk load complete! [GLnexus] [warning] Processing full length of 443 contigs, as no --bed was provided. Providing a BED file with regions of interest, if applicable, can speed this up. [GLnexus] [info] found sample set @5 [GLnexus] [info] discovering alleles in 443 range(s) on 28 threads [GLnexus] [error] Failed to discover alleles: IOError: exception deserializing BCF bucket (capnp/arena.c++:127: failed: Exceeded message traversal limit. See capnp::ReaderOptions. stack: 558149f798d8 558149a1e8f7 558149a2a1e5 558149a2a349 5581499efa38 5581499e7508 55814996d248 2ab467bf6996 5581499e3baa 5581499e3d02 55814997716c 558149fef47e 2ab467beefa2 2ab467d014ce)

Can you help me resolve this issue? Or is it the case that GLnexus cannot merge gVCF files that already contain multiple samples?

Thank you in advance for your help!

ChaimMacTavish commented 1 week ago

I saw issues about this error one year ago, and it has not been replied yet. Have they abandoned this package?