Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

Possible error / point of confusion around canonical variant IDs in docs #44

Closed edawson closed 3 years ago

edawson commented 3 years ago

Hi,

When reviewing the docs, I noticed I had some confusion around an example in the canonical variant id portion.

The final variant listed in the docs appears to have an incorrect reference allele in the variant ID of the second allele:

chr1    66572   .   GTA G,GTACTATATATTATA   45.45   PASS    .
##Format: chromosome—position—reference allele—alternate allele
    1-66572-GTA-G
    1-66572-G-GTACTATATATTA

Did I perhaps not left-align + trim this correctly in my brain? It looks to me like this variant id should read 1-66572-GTA-GTACTATATATTA. If it were trimmed, I think it might read 1-66575-A-ACTATATATTA. Any clarification y'all might have on this would be great - thanks!

MichaelStromberg commented 3 years ago

Hi Eric,

Your intuition was correct. We also normalize the variants according to NGS conventions (left align). The entire process for creating the variant ID looks like this:

Step Result
1. original VCF alleles chr1 66572 . GTA GTACTATATATTATA
2. trimmed alleles chr1 66575 . - CTATATATTATA
3. left aligned alleles chr1 66573 . - TACTATATATTA
4. add back the padding base chr1 66572 . G GTACTATATATTA
MichaelStromberg commented 3 years ago

Here's a visual depiction of the alignment process. Notice that we're really just changing the location of the genomic TA alleles.

edawson commented 3 years ago

Ah, I see. This was very helpful and a really interesting one once you showed the ref bases - thanks!!