dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

Failed to genotype: Invalid: genotyper #269

Open meghatron21 opened 2 years ago

meghatron21 commented 2 years ago

Hi, I'm getting the following error when trying to combine GVCF files:

[1779450] [2021-08-04 13:25:45.041] [GLnexus] [error] Failed to genotype: Invalid: genotyper: unexpected result when fetching record FORMAT field (file.WholeGenome <12>:89333621-89333621 SB vector length 1, expected 4)

I can see that it is because of the inconsistency of the file at this row (see attached). However, I don't know if there is a way to still complete the genotyping and ignore this line. Or do I have to manually change the individual gvcf.

Thanks, Meghana

Screen Shot 2021-08-09 at 3 53 23 PM
mlin commented 2 years ago

Is the screen shot cut off? I see SB starting 16,12,12 (?) but not sure if there's something after that. I would feel a lot better understanding why the record isn't conforming to the usual pattern vs. having GLnexus ignore the discrepancy (assuming there is one)

meghatron21 commented 2 years ago

Hi, Mike. There should be 5 lines. I can't share the full vcf due to patient privacy reasons. I have the full error message here:

[1779891] [2021-08-03 06:35:36.484] [GLnexus] [info] 500/598 (SJBALL030541_G1.WholeGenome)... [1779450] [2021-08-03 10:19:35.642] [GLnexus] [info] Loaded 598 datasets with 598 samples; 14665256630064 bytes in 147314929266 BCF records (48041046 duplicate) in 58550421 buckets. Bucket max 2217432 bytes, 22548 records. 44087 BCF records skipped due to caller-specific exceptions [1779450] [2021-08-03 10:19:35.768] [GLnexus] [info] Created sample set @598 [1779450] [2021-08-03 10:19:35.769] [GLnexus] [info] Flushing database... [1779450] [2021-08-03 10:19:50.050] [GLnexus] [info] Bulk load complete! [1779450] [2021-08-03 10:19:50.161] [GLnexus] [warning] Processing full length of 194 contigs, as no --bed was provided. Providing a BED file with regions of interest, if applicable, can speed this up. [1779450] [2021-08-03 10:19:50.178] [GLnexus] [info] found sample set @598 [1779450] [2021-08-03 10:19:50.179] [GLnexus] [info] discovering alleles in 194 range(s) on 8 threads [1779450] [2021-08-03 20:23:31.905] [GLnexus] [info] discovered 150354645 alleles [1779450] [2021-08-03 20:26:25.116] [GLnexus] [info] unified to 56016253 sites cleanly with 64588264 ALT alleles. 900147 ALT alleles were additionally included in monoallelic sites and 11419376 were filtered out on quality thresholds. [1779450] [2021-08-03 20:26:25.117] [GLnexus] [info] Finishing database compaction... [1779450] [2021-08-03 20:26:28.596] [GLnexus] [info] genotyping 56016253 sites; sample set = *@598 mem_budget = 64424509440 threads = 10 [1779450] [2021-08-04 13:25:45.041] [GLnexus] [error] Failed to genotype: Invalid: genotyper: unexpected result when fetching record FORMAT field (SJBALL030976_G1.WholeGenome <12>:89333621-89333621 SB vector length 1, expected 4)

meghatron21 commented 2 years ago

Hi, Mike. I figured it out. I just redownloaded that specific gvcf file again and it worked. I'm know having a different file write issue:

[E::bgzf_flush] File write failed (wrong size) [E::bgzf_close] File write failed [494344] [2021-08-19 01:34:09.354] [GLnexus] [error] Failed to genotype: IOError: bcf_write (-)

Is this specific to GLnexus or bcftools?

mlin commented 2 years ago

Usually this indicates that bcftools exited early for some reason, and GLnexus is just noticing that bcftools "hung up" on it. To troubleshoot it you can try redirecting GLnexus' output into a temporary .bcf file instead of piping into bcftools; then running bcftools on the .bcf file to see if it gives a useful error message.