Open ldgauthier opened 7 years ago
Thanks Laura for adding the ticket! Quick correction-- the largest distance between a pair of heterozygous variants in the same phase group (same PID) that I observed was 219 bp. That was in a phase group with multiple heterozygous variants.
Oops, thanks for correcting and clarifying @epiercehoffman !
I have a user on the forum asking whether being unable to phase MNPs is intended. https://gatkforums.broadinstitute.org/gatk/discussion/11122/pgt-and-pid-is-a-dot#latest
The PGT:PID shows up as .:.
for these trailing SNPs on the same reads as upstream same-phased SNPs.
It's hard to say without seeing all of the data. I answered on the forum.
@ldgauthier any updates on a solution? We have an example for clinically reportable variant that matches #5824.
Our long term solution is a rather large modification to the graph assembly code: https://github.com/broadinstitute/gatk/issues/5828
That will likely take a couple months, but we fully expect a dramatic improvement in phasing. Since we're working on that, spending time in a quick fix is just going to make the long term fix take longer.
On Thu, Apr 11, 2019, 4:46 PM Nils Homer notifications@github.com wrote:
@ldgauthier https://github.com/ldgauthier any updates on a solution? We have an example for clinically reportable variant that matches #5824 https://github.com/broadinstitute/gatk/issues/5824.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/3368#issuecomment-482307485, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRhdBhIYIXwi9yIHobr45Nil-8yzNgTks5vf58ygaJpZM4Olg1H .
Thanks! I’ll keep watching and let me know if you have a version I can try later this year.
Emma, a summer student in the MacArthur lab, did an analysis of the HaplotypeCaller phasing from GATK3.4 (which I believe is unchanged since then) using the gnomAD exomes and genomes.
The goal of that feature is to provide enough information to change the representation of adjacent SNPs to MNPs for more accurate functional annotation. However, only 90% of adjacent SNPs have phasing information. Analysts would prefer 100% of adjacent SNPs to have phasing information with a quality estimate.
As a side note, most of the gnomAD exomes are 75bp reads and the maximum graph assembly kmer size is 65bp, so it's interesting that there is some phasing information output for SNPs as far as 310 bases apart (not on the graph, but that's what Emma said), especially considering that HaplotypeCaller is not mate-aware. Without digging into the data, I'm guessing these are cases where there's another het SNP in between.