dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

IOError: reading gVCF header #227

Closed zoeluo15 closed 4 years ago

zoeluo15 commented 4 years ago

Hello,

I was trying to merge gvcf files using the following command, glnexus_cli --config DeepVariantWGS -a --threads 24 vcfs/*gvcf -m 32 1>vcfs/merged.gvcf 2>err.log

(version: glnexus_v1.2.3)

And here is the error message I got.

[E::bcf_hdr_read] Input is not detected as bcf or vcf format
[190977] [2020-06-24 09:03:04.774] [GLnexus] [info] 100 (Ox.sub1)...
[190392] [2020-06-24 09:16:57.945] [GLnexus] [info] Loaded 169 datasets with 169 samples; 417362543352 bytes in 4662497125 BCF records (313 duplicate) in 6440421 buckets. Bucket max 1162624 bytes, 12613 records. 0 BCF records skipped due to caller-specific exceptions
[190392] [2020-06-24 09:16:57.946] [GLnexus] [info] Created sample set *@169
[190392] [2020-06-24 09:16:57.946] [GLnexus] [error] vcfs/merged.gvcf IOError: reading gVCF header (vcfs/merged.gvcf)
[190392] [2020-06-24 09:29:05.037] [GLnexus] [error] Failed to bulk load into DB: Failure: One or more gVCF inputs failed validation or database loading; check log for details.

Any help would be greatly appreciated!

Zoe

mlin commented 4 years ago

It's saying the files are not in the expected format in a basic way. Possibly compression? See if you can go through the tutorial successfully and compare the example input file formats with what you've got.

zoeluo15 commented 4 years ago

I think I have got the correct format but GLnexus was confused by the file name, I changed sample01.sub1.gvcf to sample01_sub1.gvcf and the problem is gone. Thank you for your help! Zoe