broadinstitute / tgg_methods

Repo for miscellaneous methods developed by the methods group that don't fit anywhere else
MIT License
4 stars 0 forks source link

Basic Sanity Check QC of DRAGEN-called GREGoR Callset #84

Closed matren395 closed 5 months ago

matren395 commented 5 months ago

Basic checks of DRAGEN-called GREGoR callset to ensure it's data that can be loaded into Seqr, and that we should be loading into Seqr. How are the 'filters' and ploidy-ness and basic QC stats?

matren395 commented 5 months ago

Fun facts: VETS being absent from the 'FILTER' field is duplicated here. All calls are diploid. Interesting!

matren395 commented 5 months ago

Concluded! From an email to Steve Jahl and Julia Goodrich:

--> Received an Anvil loading request of 1420 DRAGEN-called GREGoR samples from our collaborators. I ran some basic QC on this. Further, this features 302 known RGP samples that were previously called with GATK/WARP. This provides some unique QC check opportunities. -----> In this, we see the same VETS annotation behavior. -----> All other simple checks seem above the board! -----> QC-ing this, by Hana's recommendation, also gave a very interesting finding. The distribution of autosomal non-reference call GQ scores is very different from previously seen GATK/WARP data, with many GQ scores around ~40 and then ~99, where for GATK/WARP data that cluster around ~40 isn't present for the checked GATK/WARP data, suggesting it's of higher quality. I checked this both 1) overall for the new DRAGEN GREGoR Anvil callset and 2) only within the 302 samples shared between them. This is also replicated in the new INTERNAL Dragen callset. Novel! -----> But somehow, their callset is all diploid? Even for all samples in all sex chromosomes ? According to our GP friends, this isn't possible ? We're reaching out and asking if this was called with GVS as well or not. Huh! -----> Marching orders are to load the Anvil request and just give a heads up to users to 1) be very generous with GQ filters 2) Variant QC filters for VETS aren't available for querying for this callset yet. We will re-load it with them included once the implementation goes live.

matren395 commented 5 months ago

This difference is almost all due to HET variants, interestingly enough! HOM variants don't have their GQ changed too much