dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

gVCF ordering #251

Open meghatron21 opened 3 years ago

meghatron21 commented 3 years ago

Hi.

I'm trying to merge ~500 gvcf files, but they are failing at the validation step:

[88309] [2021-03-10 04:32:58.726] [GLnexus] [info] Created sample set *@555 [88309] [2021-03-10 04:32:58.727] [GLnexus] [error] SKLJKDS.WholeGenome.g.vcf.gz Invalid: gVCF records are out-of-order (SKLJKDS..WholeGenome.g.vcf.gz 21202249 >= chr17:0--1) [88309] [2021-03-10 04:32:58.728] [GLnexus] [error] LSJDAKSD.WholeGenome.g.vcf.gz Invalid: gVCF records are out-of-order (LSJDAKSD.WholeGenome.g.vcf.gz 41352891 >= chr19:413-412) [88309] [2021-03-10 12:22:14.099] [GLnexus] [error] Failed to bulk load into DB: Failure: One or more gVCF inputs failed validation or database loading; check log for details.

This seems like an issue with the creation of the gVCFs, but I don't have access to the original BAM files to recreate the gVCFs. Is there a way to work around his error without completely excluding the files?

mlin commented 3 years ago

It looks to me like those gVCFs might be truncated (based on the message LSJDAKSD.WholeGenome.g.vcf.gz 41352891 >= chr19:413-412). Are you open to open that up to see what's going on around position chr19:41352891, perhaps the file is truncated shortly after? If so, maybe one could delete the last partial record and then GLnexus should load what remains up to that, but that's not an approach I could in good conscience recommend, of course.