broadinstitute / tgg_methods

Repo for miscellaneous methods developed by the methods group that don't fit anywhere else
MIT License
4 stars 0 forks source link

Evaluate further returned GP/DSP quality metrics and GQ #88

Open matren395 opened 2 months ago

matren395 commented 2 months ago

As they are returned, evaluate other general quality metrics from GP and DSP, for things like contamination and coverage and any/other picard metrics. Will update this ticket as I learn more about what will be delivered!

matren395 commented 2 months ago

Expanding the scope to include Hana's GQ inquiry on this ticket:

https://broadinstitute.enterprise.slack.com/archives/CG83SEN1Z/p1712760147758679?thread_ts=1711741176.499129&cid=CG83SEN1Z

matren395 commented 2 months ago

Text:

So we created an accidental natural experiment for this in seqr because one of the Gregor sites loaded a DRAGEN VCF with all the data we had submitted to Gregor, including many RGP families. Yesterday I spot checked a discovery variant Stephanie found in an RGP family and found that while in our GATK callset it had a GQ of 99, in the DRAGEN callset it has a GQ of 33: https://seqr.broadinstitute.org/summary_data/variant_lookup?genomeVersion=38&variantId=18-35067754-A-G Looking into the research you did here, it looks like the histogram for GQ values looks really similar for the DRAGEN and GATK data, in that they both have a huge spike around 40 and a smaller spike around 100. However, I wonder if theres a difference in the distributions if we break it down a bit differently. Could you run a couple other comparions for GQ distribution with the following adjustments: Remove the X chromosome - Stephanie mentioned that in GATK males on the X chromosome have a ton of GQs between 30 and 40m so that might be skewing our numbers Look at the GQ distribution for non-ref calls only, to see if tehres any difference there Its possible that it was just bad luck and I happened to spot check the one example variant where the GQ dropped off like this, but I would really want to confirm this isn't a sign of something more severe

matren395 commented 2 months ago

Fun news: it is good that we investigated this! As it were, for autosomal non-ref calls, GQ is much lower for DRAGEN-called data than our GATK-called example. Shown in the attached pdfs Seqr GATK and DRAGEN GQ Comparison Autosomal NonRef GQ 20240411.pdf Seqr GREGoR Sanity Checks - Autosomal NonRef GQ 20240411.pdf Seqr_GATK_DRAGEN_GQ_graphs.pdf Seqr GATK and DRAGEN GQ Comparison.pdf Seqr DRAGEN Sanity Checks - 20240411.pdf