Open zlskidmore opened 5 years ago
This is becoming more important for clinical applications, including cancer vaccine design, where manual review has caught several MNPs annotated as separate SNPs, requiring reannotation and reevaluation of peptides/binding. We should add a step that attempts to resolve them. Does anyone know of a tool that does this? Perhaps something could use gatk haplotypecaller to do this?
The proximal variants functionality of pvactools will fix the peptides that come out. But I agree that it would nice if they were just represented properly in the VCF.
Qingsong Gao, Ph.D. who was in Li’s lab and moved on to St. Jude had a solution to merge adjacent SNPs if they were in phase into DNPs, MNPs, ONPs, etc. It was called COCOONS but I don’t see the code in their repo anymore.. I think it was merged into their somatic wrapper https://github.com/ding-lab/TinDaisy …look here: https://github.com/ding-lab/mnp_filter
Another option to fix VCFs with phasing information (as possible from Mutect2): https://github.com/Sentieon/sentieon-scripts/tree/master/merge_mnp
It may also possible to tackle this problem at the annotation stage:
Revisiting this issue today. In the pipeline's current state, we get duplicate calls, with the DNPs coming from Mutect, and SNPs coming from another caller. This is objectively wrong and we need to handle this merging properly.
chr17 7673787 G A
chr17 7673787 GG AA
chr17 7673788 G A
chr17 7675993 C T
chr17 7675993 CC TT
chr17 7675994 C T
One thing to be aware of is that when these DNPs are merged, bam-readcount will no longer be able to process them.
Does bam-readcount not have the ability to count such things under its in/del support. A DNP is essentially a delins. Maybe bam-readcount doesn't support those either...
It does not, and even multi-bp indels are kind of iffy with bam-readcount, given that it's not doing any local realignment or anything, as the variant callers often do. For years, we've discussed possible alternatives to bam-readcount, such as prioritizing the values from one caller or another in different situations, but it's one of those seemingly simple things that gets kind of complicated down in the weeds, and no one has had the bandwidth to implement something.
Leaving this reminder to myself,
In the workflows I've run I've encountered multiple cases where two SNPs are annotated separately however, in reality, they are DNPs and therefore should have different consequence annotations than what is reported.
There is a tool here to take care of this https://github.com/hubentu/MAC, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521406/
We should incorporate something like this in the workflows.