WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

Fields altered in the output in table format #113

Closed jfnavarro closed 4 years ago

jfnavarro commented 4 years ago

Hello,

convert2annovar.pl is doing a trimming in some of the variants. Most specifically insertions and deletions.

Example (input):

111991946 . TC T

Output:

111991947 111991947 C -

I believe the change is performed in the function "adjustStartEndRefAlt" and I wonder if there is a way to disable this adjustment so the VCF output is following the standard?

Thanks!

kaichop commented 4 years ago

Users are not supposed to use convert2annovar.pl yourself, unless you are already very familiar with ANNOVAR. (There is an argument to disable this feature in convert2annovar, but there is no reason to use it; use table_annovar instead for all your annotation needs)

On Thu, Oct 8, 2020 at 10:00 AM José Fernández Navarro < notifications@github.com> wrote:

Hello,

convert2annovar.pl is doing a trimming in some of the variants. Most specifically insertions and deletions.

Example (input):

111991946 . TC T

Output:

111991947 111991947 C -

I believe the change is performed in the function "adjustStartEndRefAlt" and I wonder if there is a way to disable this adjustment so the VCF output is following the standard?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/113, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OGLTDLWSV2DDOED5RTSJXAWJANCNFSM4SIZPB4Q .

jfnavarro commented 4 years ago

Thanks for your answer. I have now discovered the --vcfinput option that does the work so I do not need to use convert2annovar (I was working with some legacy code that made use of this script prior the annotation with table_annovar). Although, I could not see any flag/code that would disable the call to "adjustStartEndRefAlt" in convert2annovar. Thanks again for the help!

jfnavarro commented 4 years ago

The problem is still there when I look at the output of table_annovar in table format (txt). Not in the VCF though. Why would they differ for the deletions?

kaichop commented 4 years ago

There are a few columns in table_annovar output that reflect the exact fields in the original VCF file.

On Thu, Oct 8, 2020 at 12:41 PM José Fernández Navarro < notifications@github.com> wrote:

The problem is still there when I look at the output of table_annovar in table format (txt).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/113#issuecomment-705690109, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OEQ7SBBR32TBSMIB5TSJXTRZANCNFSM4SIZPB4Q .

jfnavarro commented 4 years ago

Of course some fields are identical but the POS, REF and ALT fields are not the same for certain type of variants. Is this the expected behavior?

kaichop commented 4 years ago

No, the full original record of VCF file should be in the final output file (they are NOT in the first sive columns, but the columns after the annotation). Please send a specific example line if you observe otherwise.

On Thu, Oct 8, 2020 at 1:02 PM José Fernández Navarro < notifications@github.com> wrote:

Of course some fields are identical but the POS, REF and ALT fields are not the same for certain type of variants. Is this the expected behavior?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/113#issuecomment-705701505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OFOL7WQFFYIM72BDC3SJXWCNANCNFSM4SIZPB4Q .

jfnavarro commented 4 years ago

I can show you plenty of examples from real data (both hg19 and GRCh38). Like this one:

Input to Annovar (vcf):

chr1 111449324 . TC T . PASS AC=1;AF=0.250;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=428,604|44,60;DP=1188;ECNT=1;GERMQ=93;MBQ=35,35;MFRL=197,196;MMQ=60,60;MPOS=20;NALOD=2.66;NLOD=128.99;POPAF=6.00;RPA=2,1;RU=C;STR;STRQ=93;TLOD=228.09;set=mutect GT:AD:AF:DP:F1R2:F2R1:SB 0/0:487,0:2.262e-03:487:364,0:100,0:200,287,0,0 0/1:545,104:0.160:649:222,44:299,57:228,317,44,60 ./. ./. ./. ./. ./. ./.

Output from Annovar (table format)

chr1 111449325 111449325 C - upstream ATP5PB;WDR77 dist=139 . . UTR5 ENST00000235090.9;ENST00000369722.7 ENST00000235090.9:c.-156delG;ENST00000369722.7:c.-217del- . . UTR5 ATP5F1;WDR77 ENST00000369722.7:c.-217del-;ENST00000235090.9:c.-156delG . . . . . . . . . 0.25 . 1188 chr1 111449324 . TC T . PASS AC=1;AF=0.250;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=428,604|44,60;DP=1188;ECNT=1;GERMQ=93;MBQ=35,35;MFRL=197,196;MMQ=60,60;MPOS=20;NALOD=2.66;NLOD=128.99;POPAF=6.00;RPA=2,1;RU=C;STR;STRQ=93;TLOD=228.09;set=mutect GT:AD:AF:DP:F1R2:F2R1:SB 0/0:487,0:2.262e-03:487:364,0:100,0:200,287,0,0 0/1:545,104:0.160:649:222,44:299,57:228,317,44,60 ./. ./. ./. ./. ./. ./.

As you can see the variant has been trimmed off and the position has been increased by 1. This does not happen in the VCF output:

chr1 111449324 . TC T . PASS AC=1;AF=0.250;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=428,604|44,60;DP=1188;ECNT=1;GERMQ=93;MBQ=35,35;MFRL=197,196;MMQ=60,60;MPOS=20;NALOD=2.66;NLOD=128.99;POPAF=6.00;RPA=2,1;RU=C;STR;STRQ=93;TLOD=228.09;set=mutect;ANNOVAR_DATE=2019-10-24;Func.refGene=upstream;Gene.refGene=ATP5PB\x3bWDR77;GeneDetail.refGene=dist\x3d139;ExonicFunc.refGene=.;AAChange.refGene=.;Func.knownGene=UTR5;Gene.knownGene=ENST00000235090.9\x3bENST00000369722.7;GeneDetail.knownGene=ENST00000235090.9:c.-156delG\x3bENST00000369722.7:c.-217del-;ExonicFunc.knownGene=.;AAChange.knownGene=.;Func.ensGene=UTR5;Gene.ensGene=ATP5F1\x3bWDR77;GeneDetail.ensGene=ENST00000369722.7:c.-217del-\x3bENST00000235090.9:c.-156delG;ExonicFunc.ensGene=.;AAChange.ensGene=.;avsnp150=.;ALL.sites.2015_08=.;EUR.sites.2015_08=.;AMR.sites.2015_08=.;EAS.sites.2015_08=.;AFR.sites.2015_08=.;cosmic70=.;ALLELE_END GT:AD:AF:DP:F1R2:F2R1:SB 0/0:487,0:2.262e-03:487:364,0:100,0:200,287,0,0 0/1:545,104:0.160:649:222,44:299,57:228,317,44,60 ./. ./. ./. ./. ./. ./.

kaichop commented 4 years ago

As I said earlier the first six columns are ANNOVAR-specific columns. Your original VCF information is in the columns after all annotation (" chr1 111449324 . TC T . PASS"); if you read the output you will see these columns after the "1188" number.

On Fri, Oct 9, 2020 at 3:45 AM José Fernández Navarro < notifications@github.com> wrote:

I can show you plenty of examples from real data (both hg19 and GRCh38). Like this one:

Input to Annovar (vcf):

chr1 111449324 . TC T . PASS AC=1;AF=0.250;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=428,604|44,60;DP=1188;ECNT=1;GERMQ=93;MBQ=35,35;MFRL=197,196;MMQ=60,60;MPOS=20;NALOD=2.66;NLOD=128.99;POPAF=6.00;RPA=2,1;RU=C;STR;STRQ=93;TLOD=228.09;set=mutect GT:AD:AF:DP:F1R2:F2R1:SB 0/0:487,0:2.262e-03:487:364,0:100,0:200,287,0,0 0/1:545,104:0.160:649:222,44:299,57:228,317,44,60 ./. ./. ./. ./. ./. ./.

Output from Annovar (table format)

chr1 111449325 111449325 C - upstream ATP5PB;WDR77 dist=139 . . UTR5 ENST00000235090.9;ENST00000369722.7 ENST00000235090.9:c.-156delG;ENST00000369722.7:c.-217del- . . UTR5 ATP5F1;WDR77 ENST00000369722.7:c.-217del-;ENST00000235090.9:c.-156delG . . . . . . . . . 0.25 . 1188 chr1 111449324 . TC T . PASS AC=1;AF=0.250;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=428,604|44,60;DP=1188;ECNT=1;GERMQ=93;MBQ=35,35;MFRL=197,196;MMQ=60,60;MPOS=20;NALOD=2.66;NLOD=128.99;POPAF=6.00;RPA=2,1;RU=C;STR;STRQ=93;TLOD=228.09;set=mutect GT:AD:AF:DP:F1R2:F2R1:SB 0/0:487,0:2.262e-03:487:364,0:100,0:200,287,0,0 0/1:545,104:0.160:649:222,44:299,57:228,317,44,60 ./. ./. ./. ./. ./. ./.

As you can see the variant has been trimmed off and the position has been increased by 1. This does not happen in the VCF output:

chr1 111449324 . TC T . PASS AC=1;AF=0.250;AN=4;AS_FilterStatus=SITE;AS_SB_TABLE=428,604|44,60;DP=1188;ECNT=1;GERMQ=93;MBQ=35,35;MFRL=197,196;MMQ=60,60;MPOS=20;NALOD=2.66;NLOD=128.99;POPAF=6.00;RPA=2,1;RU=C;STR;STRQ=93;TLOD=228.09;set=mutect;ANNOVAR_DATE=2019-10-24;Func.refGene=upstream;Gene.refGene=ATP5PB\x3bWDR77;GeneDetail.refGene=dist\x3d139;ExonicFunc.refGene=.;AAChange.refGene=.;Func.knownGene=UTR5;Gene.knownGene=ENST00000235090.9\x3bENST00000369722.7;GeneDetail.knownGene=ENST00000235090.9:c.-156delG\x3bENST00000369722.7:c.-217del-;ExonicFunc.knownGene=.;AAChange.knownGene=.;Func.ensGene=UTR5;Gene.ensGene=ATP5F1\x3bWDR77;GeneDetail.ensGene=ENST00000369722.7:c.-217del-\x3bENST00000235090.9:c.-156delG;ExonicFunc.ensGene=.;AAChange.ensGene=.;avsnp150=.;ALL.sites.2015_08=.;EUR.sites.2015_08=.;AMR.sites.2015_08=.;EAS.sites.2015_08=.;AFR.sites.2015_08=.;cosmic70=.;ALLELE_END GT:AD:AF:DP:F1R2:F2R1:SB 0/0:487,0:2.262e-03:487:364,0:100,0:200,287,0,0 0/1:545,104:0.160:649:222,44:299,57:228,317,44,60 ./. ./. ./. ./. ./. ./.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/113#issuecomment-706026886, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OFQLLWX7N3DJE5CUVLSJ25QFANCNFSM4SIZPB4Q .

jfnavarro commented 4 years ago

I see what you mean now :)