dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

Error message: PL vector length 2, expected 3 - did the job failed? #259

Open antoniocampos13 opened 3 years ago

antoniocampos13 commented 3 years ago

I tested the GLnexus Docker image, and tried to combine just two of my WGS gVCF samples to familiarize myselft with the program. I got the error message:

Failed to genotype: Invalid: genotyper: unexpected result when fetching record FORMAT field ([FILE-NAME] <22>:2781480-2781529 PL vector length 2, expected 3)

However, I could convert the BCF to a VCF as instructed in the Get Started page. I see that chrX variants were present in the VCF, but no chrY or alt contigs. Does it mean that the job failed completely or this specific variant was just skipped/filtered and everything is OK? The same for the alt contigs?

Thanks

mlin commented 3 years ago

Which upstream variant caller is generated the gVCF? It sounds like a male sample where the X genotypes are being analyzed as haploid, is that true? We may be out of sync with the upstream caller's representation of that (as they've been evolving/improving over time).

I would not recommend using an incomplete output, although it's likely that the autosomal calls (before it got to that point) are complete.

antoniocampos13 commented 3 years ago

Thanks for the feedback Mike. Indeed, it is one gVCF from a male and one from a female. We used Illumina's DRAGEN to generate the gVCFs. Yes, it treats the non-PAR X region as haploid. I assumed --config gatk was the closest to DRAGEN's. Should I change any configuration to account for that?

mlin commented 3 years ago

@antoniocampos13 Thanks for the info -- I just recalled there's an outstanding PR on this repo by @VorontsovIE to handle the DRAGEN haploid calls, but I haven't merged it due to my lack of test data suitable for including in the software's test suite. As it's been awhile since I've looked, are you perchance aware of any public, unencumbered DRAGEN gVCFs that'd be suitable to create test cases out of?

antoniognmk commented 3 years ago

Hi Mike. I checked the PR your mentioned. I built the glnexus_cli executable via Docker with the files from the PR. It worked well!

A summary of the steps I followed in case it helps anyone else:

I have not checked the mitochondrial genome yet, but I believe my problems are solved for now. Thanks again Mike, thanks @VorontsovIE, also for the outstanding PR.

Unfortunately I am not aware of any public DRAGEN gVCF, but I will let you know if I find any.

yangyxt commented 1 year ago

Hi Mike. I checked the PR your mentioned. I built the glnexus_cli executable via Docker with the files from the PR. It worked well!

A summary of the steps I followed in case it helps anyone else:

  • Built the glnexus_cliexecutable via Docker with the files from the @VorontsovIE 's PR
  • I used gatk_unfiltered config: no errors happened
  • Imported the output VCF into Hail (I tried GLnexus because Hail's run_combiner() was not working for me, since it had haploid calls, but it is another story)
  • I checked a random non-PAR X chromosome locus: the haploid calls were transformed into diploid
  • Converted back into haploid calls with GNOMAD Hail utilities (adjust_sex_ploidy())
  • Re-calculated allelic frequencies with Hail's annotate_rows() and agg.call_stats() to account for the ploidy

I have not checked the mitochondrial genome yet, but I believe my problems are solved for now. Thanks again Mike, thanks @VorontsovIE, also for the outstanding PR.

Unfortunately I am not aware of any public DRAGEN gVCF, but I will let you know if I find any.

Thank you for the info. Since it's almost a year. I wonder whether the current Docker executable for Glnexus contains the content from the PR or not.