broadinstitute / tgg_methods

Repo for miscellaneous methods developed by the methods group that don't fit anywhere else
MIT License
4 stars 0 forks source link

Pairwise investigation of GQ and other metrics between prior GATK/WARP RGP callset and RGP samples in DRAGEN-called GREGoR callset #85

Closed matren395 closed 1 week ago

matren395 commented 3 months ago

As per the title, pairwise investigation of GQ and other metrics between prior GATK/WARP RGP callset and RGP samples in DRAGEN-called GREGoR callset. Hana Snow raised the concern that some variants in individuals have notably different GQ scores between prior-loaded RGP data (GATK/WARP) and how they appear in the new DRAGEN-called GREGoR callset.

This is also a chance to investigate other differences between GATK and DRAGEN with this callset. Thinking out loud - for individuals with solves or diagnostic variants in prior GATK data, do those variants 1) appear 2) with good quality in the new DRAGEN-called data?

matren395 commented 3 months ago

this is the expanded version of: https://app.zenhub.com/workspaces/tgg-methods-6613e5b68c36e00025172757/issues/gh/broadinstitute/tgg_methods/84

matren395 commented 3 months ago

update: this is being taken down a step in terms of priority. Pairwise work does not need done before the data is loaded, since this is an Anvil request from the GREGoR team. Other QC work is moving ahead though.

matren395 commented 3 months ago

I will be posting some very interesting plots on the matter come next week, but a high-level discovery is:

PER-ENTRY (GQ>0, AD>0) DISTRIBUTION OF GQ DIFFERENCES (DRAGEN.GQ - GATK.GQ) FOR AUTOSOMAL NON-REF ENTRIES:

image

matren395 commented 1 week ago

This was resolved (1 - confirmed that HET GQ scores are lower for the SAME variant in the same individual, 2 - updated our Seqr querying filters to match) , and the data was loaded!