WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

Different annovar version result in different position of aachange! How does that happened? #99

Closed Bio-MingChen closed 4 years ago

Bio-MingChen commented 4 years ago

Hi, When I used different version annovar software,I got different position of anmino acid in aachange column,just like below: Before update with version 2018-4-16:

A TET2,TET2-AS1 4 106197102 frameshift deletion DEL TC T c.5436delC p.V1812fs NM_001127208
B TET2,TET2-AS1 4 106197334 frameshift deletion DEL CAA C c.5668_5669del p.N1890fs NM_001127208
C TET2,TET2-AS1 4 106158407 frameshift deletion DEL ATT A c.3309_3310del p.N1103fs NM_001127208
D ASXL1 20 31022441 frameshift insertion INS A AG c.1927dupG p.G642fs NM_015338
E ASXL1 20 31022438 frameshift deletion DEL CGGAGG C c.1924_1928del p.G642fs NM_015338
After update with version 2019-10-24: A TET2-AS1,TET2 4 106197102 frameshift deletion DEL TC T c.5436delC p.Q1813Kfs*7 NM_001127208
B TET2-AS1,TET2 4 106197334 stopgain DEL CAA C c.5668_5669del p.N1890* NM_001127208
C TET2-AS1,TET2 4 106158407 frameshift deletion DEL ATT A c.3309_3310del p.F1104Yfs*25 NM_001127208
D ASXL1 20 31022441 frameshift insertion INS A AG c.1927dupG p.G646Wfs*12 NM_015338
E ASXL1 20 31022438 frameshift deletion DEL CGGAGG C c.1924_1928del p.G644Wfs*12 NM_015338

You can see site A and C have different aachange position from 1812 to 1813 and from 1103 to 1104, site B variant type changes from frameshift deletion to stopgain site D an E have different position after version update So,how do these happend? Which result should I believe? Expecting to your anwswer,best wishes for annovar team!

kaichop commented 4 years ago

There is an update to add "-polish" by default sometime in 2019, and I guess that is the reason you see the difference, if you indeed used table_annovar.pl. Your question did not include command so it is hard for me to tell.

In the first case, c.5436 should normally affect protein position 1812 (because if you divide it by 3, you'll get 1812). However, it is most likely that after deleting the C, the amino acid at 1812 does not really change (it remains as a C), so the "polish" version print out 1838 instead. In the second case, it is a nomenclature issue. It is both a frameshift deletion and a stop gain. However, most frameshift changes will result in stop gain, so to differentiate the situation where a stop gain is introduced immediately at the same site of the codon, I use "stopgain" for this specific type of situation as a "polishing" step.

Again, if you indeed use table_annovar.pl, you can check the use of "-nopolish" and "-polish" to see the difference. It makes more sense to do polishing, which is why I made it a default choice nowadays.

On Tue, Jun 30, 2020 at 3:19 AM KingCM notifications@github.com wrote:

Hi, When I used different version annovar software,I got different position of anmino acid in aachange column,just like below: Before update with version 2018-4-16: A TET2,TET2-AS1 4 106197102 frameshift deletion DEL TC T c.5436delC p.V1812fs NM_001127208 B TET2,TET2-AS1 4 106197334 frameshift deletion DEL CAA C c.5668_5669del p.N1890fs NM_001127208 C TET2,TET2-AS1 4 106158407 frameshift deletion DEL ATT A c.3309_3310del p.N1103fs NM_001127208 D ASXL1 20 31022441 frameshift insertion INS A AG c.1927dupG p.G642fs NM_015338 E ASXL1 20 31022438 frameshift deletion DEL CGGAGG C c.1924_1928del p.G642fs NM_015338

After update with version 2019-10-24: A TET2-AS1,TET2 4 106197102 frameshift deletion DEL TC T c.5436delC p.Q1813Kfs7 NM_001127208 B TET2-AS1,TET2 4 106197334 stopgain DEL CAA C c.5668_5669del p.N1890 NM_001127208 C TET2-AS1,TET2 4 106158407 frameshift deletion DEL ATT A c.3309_3310del p.F1104Yfs25 NM_001127208 D ASXL1 20 31022441 frameshift insertion INS A AG c.1927dupG p.G646Wfs12 NM_015338 E ASXL1 20 31022438 frameshift deletion DEL CGGAGG C c.1924_1928del p.G644Wfs*12 NM_015338

You can see site A and C have different aachange position from 1812 to 1813 and from 1103 to 1104, site B variant type changes from frameshift deletion to stopgain site D an E have different position after version update So,how do these happend? Which result should I believe? Expecting to your anwswer,best wishes for annovar team!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/99, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OHOBPFTBGRCTEEEJVDRZGGXZANCNFSM4OL52O6A .

Bio-MingChen commented 4 years ago

It's so nice of you and thanks for your answer, I 've checked ACDE sites and all of them are the condition you have mentioned before,I code a small tools to show these,like below:

Site A c.5436delC

image

Site C c.3309_3310del

image

Site D c.1927dupG

image

Site E c.1924_1928del

image

The first line is coordination of cDNA and the second line is coordination of amino acid, the third line is the reference codon sequence and the forth line is the reference sequence of amino acid and the fifth and sixth line represent the sequence of bases and amino acid after mutating.

Bio-MingChen commented 4 years ago

Loving annovar software and best thanks for professor Wang! And I will close this comment after 1week in case anyone else has questions about this.

kaichop commented 4 years ago

@KingCM Just wondering whether you can share the script to print out the formatted nucleotides and amino acids with position information, as shown in the figure. It may be helpful to other users.