dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

Genotyping GVCFs from other vendors #145

Open raonyguimaraes opened 5 years ago

raonyguimaraes commented 5 years ago

Hi there,

I have a question regarding genotyping GVCFs that were generated from other vendors.

For example, can I use GVCFs from Sentieon to jointly genotype variants using GLNexus?

What impact this should have in the variant discovery process?

Thank you for your attention!

mlin commented 5 years ago

Further to the reply on #146, there are two considerations

  1. Essential aspects of the GVCF format in terms of how variants and reference coverage are represented, what fields are named, how scores & metrics are scaled, etc. This will bear on, can you run them through GLnexus and get a well-formed project VCF out at all?
  2. The desired sensitivity/specificity trade-off in joint genotyping. In very large cohort studies, you don't want to keep every single borderline/low quality variant found in every sample's GVCF, because this causes the project VCF size to explode impractically. But in smaller studies, this isn't such a big deal.

In the case of Sentieon's (excellent) tools, the gatk_unfiltered config should be suitable for maximum sensitivity, while the gatk config will be more practical for N greater than, say, a thousand (there's no hard cutoff -- that's just a gut feeling).

For other tools that produce their own GVCF flavors that aren't effectively identical to something GLnexus already supports, unfortunately there's no way around some painstaking work to catalog the unique features, write a suitable configuration, and calibrate it using test data. We currently have this in progress for DeepVariant and Strelka2 (they're marked "experimental")