WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
224 stars 342 forks source link

Error when Annotating vcf from Strelka #64

Closed itamuria closed 5 years ago

itamuria commented 5 years ago

Dear,

I tried to use the annovar to annotate the vcf output file from Strelka.

When I used table_annovar.pl program to convert the VCF files:

./table_annovar.pl $inputfile $annoted_databases_folder/ -buildver hg38 -out $outfile-remove \ -protocol refGene,exac03,ljb26_all,avsnp150,ALL.sites.2015_08 \ -operation g,f,f,f,f -nastring . -vcfinput

I got the error below:

Error: invalid record in VCF file: the GT specifier is not present in the FORMAT string

I used the same files with other variant callers and they worked. How could I include the missing fields?

Thanks, All the best, Ibon

kaichop commented 5 years ago

You should not use ALL.sites.2015_08 in your command line, read FAQ #1 and

4.

ljb26_all is extremely outdated and should not be used. You need to list the actual message that you see in the screen. I suspect that you are using an extremely old version of ANNOVAR but I cannot tell for sure.

On Thu, May 16, 2019 at 6:59 AM Ibon Tamayo notifications@github.com wrote:

Dear,

I tried to use the annovar to annotate the vcf output file from Strelka.

When I used table_annovar.pl program to convert the VCF files:

./table_annovar.pl $inputfile $annoted_databases_folder/ -buildver hg38 -out $outfile-remove \ -protocol refGene,exac03,ljb26_all,avsnp150,ALL.sites.2015_08 \ -operation g,f,f,f,f -nastring . -vcfinput

I got the error below:

Error: invalid record in VCF file: the GT specifier is not present in the FORMAT string

I used the same files with other variant callers and they worked. How could I include the missing fields?

Thanks, All the best, Ibon

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/64?email_source=notifications&email_token=ABNG3OG2QAPCAFLNMX6JSGLPVU5CXA5CNFSM4HNLMIR2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUEPYPA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNG3OG3YTMFN4SAKSSZLXDPVU5CXANCNFSM4HNLMIRQ .

itamuria commented 5 years ago

I have followed your advice and now the command is:

./table_annovar.pl $infile $annoted_databases_folder/ -buildver hg38 -out $outfile -remove \ -protocol refGene,exac03,avsnp150 \ -operation g,f,f -nastring . -vcfinput

But still, I got the same error

NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 /home/somatic.snvs.vcf > /home/strelka_annotated_somatic_snp.vcf.avinput> Error: invalid record in VCF file: the GT specifier is not present in the FORMAT string: <chr1 13418 . G A . LowEVS SOMATIC;QSS=50;TQSS=2;NT=ref;QSS_NT=50;TQSS_NT=2;SGT=GG->AG;DP=481;MQ=22.53;MQ0=224;ReadPosRankSum=-2.33;SNVSB=0.00;SomaticEVS=1.28 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 83:0:0:0:1,5:0,0:82,150:0,0 155:0:0:0:9,21:0,0:146,305:0,0> Error running system command: <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 /home/somatic.snvs.vcf > /home/strelka_annotated_somatic_snp.vcf.avinput>

I guess the version of the ANNOVAR is updated.

our $REVISION = '$Revision: 9f9e0f9efe83690a15a6aeb7714f1fc3a2341deb $'; our $DATE = '$Date: 2018-04-16 00:47:49 -0400 (Mon, 16 Apr 2018) $';
our $AUTHOR = '$Author: Kai Wang kaichop@gmail.com $';

What more I could try? Thanks

itamuria commented 5 years ago

I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works.

safsharh commented 2 years ago

I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works.

I have same problem can you describe more precisely? Thank you .

kaichop commented 2 years ago

For software that call somatic variants, many of them do not give genotype calls. So in the VCF file, you find the two columns corresponding to FORMAT and the sample. Add a ":GT" to FORMAT, and then add a random genotype such as ":0/1" to the sample information. So you give it a genotype call.

In recent versions of ANNOVAR, table_annovar should be able to handle these issues automatically when you specify specific arguments.

On Fri, Oct 15, 2021 at 8:10 AM safsharh @.***> wrote:

I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works.

I have same problem can you describe more precisely? Thank you .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/64#issuecomment-944246055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OHSHRSXLW7WFDNWM6DUHAK3PANCNFSM4HNLMIRQ .

safsharh commented 2 years ago

Hi Thank you for kind reply Can I modify file with notepad ++ Regards, Saeid

On Fri, 15 Oct 2021, 16:02 Kai Wang, @.***> wrote:

For software that call somatic variants, many of them do not give genotype calls. So in the VCF file, you find the two columns corresponding to FORMAT and the sample. Add a ":GT" to FORMAT, and then add a random genotype such as ":0/1" to the sample information. So you give it a genotype call.

In recent versions of ANNOVAR, table_annovar should be able to handle these issues automatically when you specify specific arguments.

On Fri, Oct 15, 2021 at 8:10 AM safsharh @.***> wrote:

I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works.

I have same problem can you describe more precisely? Thank you .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/64#issuecomment-944246055, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABNG3OHSHRSXLW7WFDNWM6DUHAK3PANCNFSM4HNLMIRQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/64#issuecomment-944260145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKOVAPLBNZUA5K7IURGDVCTUHANM5ANCNFSM4HNLMIRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kaichop commented 2 years ago

Yes, but it is going to be hard to do it with many variants manually. You can ask somebody to write a simple script instead. If you provide a few lines in the VCF file so I can see what it looks like, I can probably give you a simple perl command to do it too.

On Fri, Oct 15, 2021 at 2:05 PM safsharh @.***> wrote:

Hi Thank you for kind reply Can I modify file with notepad ++ Regards, Saeid

On Fri, 15 Oct 2021, 16:02 Kai Wang, @.***> wrote:

For software that call somatic variants, many of them do not give genotype calls. So in the VCF file, you find the two columns corresponding to FORMAT and the sample. Add a ":GT" to FORMAT, and then add a random genotype such as ":0/1" to the sample information. So you give it a genotype call.

In recent versions of ANNOVAR, table_annovar should be able to handle these issues automatically when you specify specific arguments.

On Fri, Oct 15, 2021 at 8:10 AM safsharh @.***> wrote:

I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works.

I have same problem can you describe more precisely? Thank you .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/WGLab/doc-ANNOVAR/issues/64#issuecomment-944246055 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABNG3OHSHRSXLW7WFDNWM6DUHAK3PANCNFSM4HNLMIRQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/64#issuecomment-944260145, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AKOVAPLBNZUA5K7IURGDVCTUHANM5ANCNFSM4HNLMIRQ

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/64#issuecomment-944495267, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OBCTPSIMXCHOH3M5TDUHBUNBANCNFSM4HNLMIRQ .

safsharh commented 2 years ago

Hi Thank you for your kind attention A few lines in the VCF file which were pasted in separate files could be found i n attachment . Regards, Saeid

fileformat=VCFv4.2

FILTER=

samtoolsVersion=1.10+htslib-1.10.2-3

samtoolsCommand=samtools mpileup -g -f GRCh38.fna /mnt/h/ngs/exom/my-sorted

reference=file://GRCh38.fna

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

ALT=

INFO=

INFO=

INFO=

INFO=

INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

bcftools_viewVersion=1.10.2+htslib-1.10.2-3

bcftools_viewCommand=view -v indels /mnt/h/ngs/exom/myraw.bcf; Date=Fri Oct 15 14:52:47 2021

bcftools_viewCommand=view /mnt/h/ngs/exom/my-var.bcf; Date=Fri Oct 15 14:53:16 2021

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT /mnt/h/ngs/exom/my-sorted

NC_000001.11 10105 . accc aCccc 0 . INDEL;IDV=2;IMF=0.0645161;DP=31;I16=12,5,1,0,411,10777,22,484,10,100,30,900,291,6461,25,625;QS=0.770833,0.229167;VDB=5.15001e-05;SGB=-0.379885;MQSB=0.885629;MQ0F=0.774194 PL 0,32,8 NC_000001.11 267777 . caaaaaaaaaaaaaaaa caaaaaaaaaaaaaaa 0 . INDEL;IDV=1;IMF=0.2;DP=5;I16=2,0,1,0,62,1922,25,625,0,0,0,0,50,1250,25,625;QS=0.666667,0.333333;VDB=0.0249187;SGB=-0.379885;MQ0F=1 PL 0,3,3 NC_000001.11 626805 . tg t 0 . INDEL;IDV=1;IMF=0.0625;DP=16;I16=6,3,0,1,231,6521,34,1156,0,0,0,0,119,2593,8,64;QS=0.9,0.1;VDB=5.18514e-05;SGB=-0.379885;MQSB=1.00775;MQ0F=1 PL 0,20,16 NC_000001.11 631147 . GA GAA 0 . INDEL;IDV=4;IMF=1;DP=4;I16=0,0,2,0,0,0,76,2888,0,0,0,0,0,0,50,1250;QS=0,1;VDB=0.02;SGB=-0.453602;MQ0F=1 PL 7,6,0 NC_000001.11 759469 . ac aCCCCc 0 . INDEL;IDV=1;IMF=0.0454545;DP=22;I16=11,7,1,0,920,50134,95,9025,0,0,0,0,296,6002,22,484;QS=0.947368,0.0526316;VDB=2.21632e-05;SGB=-0.379885;MQSB=1;MQ0F=1 PL 0,44,21 NC_000001.11 759506 . C CTCTG 0 . INDEL;IDV=1;IMF=0.142857;DP=7;I16=2,1,1,0,133,8757,113,12769,0,0,0,0,75,1875,15,225;QS=0.75,0.25;VDB=0.0395055;SGB=-0.379885;MQSB=1;MQ0F=1 PL 0,5,6 NC_000001.11 775687 . caaaaaa caaaaa 0 . INDEL;IDV=2;IMF=1;DP=2;I16=0,0,1,0,0,0,29,841,0,0,0,0,0,0,25,625;QS=0,1;SGB=-0.379885;MQ0F=1 PL 4,3,0 NC_000001.11 791620 . g gGTTCA 0 . INDEL;IDV=4;IMF=1;DP=4;I16=0,0,2,2,0,0,450,53426,0,0,220,12400,0,0,100,2500;QS=0,1;VDB=0.00835905;SGB=-0.556411;MQSB=0.5;MQ0F=0 PL 203,12,0 NC_000001.11 1010306 . gcc gCcc 0 . INDEL;IDV=4;IMF=0.8;DP=5;I16=1,0,0,2,33,1089,78,3042,60,3600,120,7200,3,9,50,1250;QS=0.297297,0.702703;VDB=0.0349627;SGB=-0.453602;MQSB=1;MQ0F=0 PL 63,0,24 NC_000001.11 1265591 . gttttctctccatt gt 0 . INDEL;IDV=1;IMF=0.2;DP=5;I16=1,1,0,1,309,62021,239,57121,120,7200,39,1521,50,1250,25,625;QS=0.754717,0.245283;VDB=0.142856;SGB=-0.379885;MQSB=1;MQ0F=0.4 PL 30,0,111 NC_000001.11 1324904 . G GT 0 . INDEL;IDV=2;IMF=1;DP=2;I16=0,0,0,2,0,0,77,2977,0,0,120,7200,0,0,10,50;QS=0,1;VDB=0.02;SGB=-0.453602;MQ0F=0 PL 70,6,0 NC_000001.11 1379083 . atttttttttt attttttttt 0 . INDEL;IDV=4;IMF=1;DP=4;I16=0,0,2,2,0,0,83,1963,0,0,240,14400,0,0,96,2308;QS=0,1;VDB=0.510154;SGB=-0.556411;MQSB=1;MQ0F=0 PL 36,12,0 NC_000001.11 1443699 . t tCC 0 . INDEL;IDV=1;IMF=0.0116279;DP=86;I16=81,0,1,0,4284,227958,38,1444,4,16,3,9,2025,50625,25,625;QS=0.987805,0.0121951;VDB=1.82169e-44;SGB=-0.379885;MQ0F=0.976744 PL 0,228,9 NC_000001.11 1443710 . t tGTCTATA 0 . INDEL;IDV=1;IMF=0.0117647;DP=85;I16=81,0,1,0,9594,1.14456e+06,168,28224,4,16,0,0,1377,23409,25,625;QS=0.987805,0.0121951;VDB=9.94922e-44;SGB=-0.379885;MQ0F=0.976471 PL 0,228,9 NC_000001.11 1461193 . attttttttttt atttttttttt 0 . INDEL;IDV=1;IMF=0.5;DP=2;I16=0,0,0,1,0,0,25,625,0,0,60,3600,0,0,12,144;QS=0,1;SGB=-0.379885;MQ0F=0 PL 9,3,0 NC_000001.11 1485295 . CT C 0 . INDEL;IDV=8;IMF=1;DP=8;I16=0,0,0,1,0,0,16,256,0,0,0,0,0,0,25,625;QS=0,1;SGB=-0.379885;MQ0F=1 PL 4,3,0 NC_000001.11 1502909 . aatatatatatatata aatatatatatata 0 . INDEL;IDV=2;IMF=0.142857;DP=14;I16=6,0,1,1,336,18816,58,2194,360,21600,49,1681,150,3750,50,1250;QS=0.872727,0.127273;VDB=0.000179998;SGB=-0.453602;MQSB=1;MQ0F=0 PL 25,0,201

maximus3219 commented 8 months ago

For software that call somatic variants, many of them do not give genotype calls. So in the VCF file, you find the two columns corresponding to FORMAT and the sample. Add a ":GT" to FORMAT, and then add a random genotype such as ":0/1" to the sample information. So you give it a genotype call. In recent versions of ANNOVAR, table_annovar should be able to handle these issues automatically when you specify specific arguments. On Fri, Oct 15, 2021 at 8:10 AM safsharh @.***> wrote: I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works. I have same problem can you describe more precisely? Thank you . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#64 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OHSHRSXLW7WFDNWM6DUHAK3PANCNFSM4HNLMIRQ .

How exactly can I add "GT" field? Thanks.

maximus3219 commented 8 months ago

I solved. I added the GT label to the FORMAT and added 0/0 to all the cases in the vcf file. Now it works.

How exactly did you add GT label?

kaichop commented 6 months ago

This command should work

perl -pe 's/\tPL\t/\tGT:PL\t0/1\t/' < inputfile > outputfile

to generate an output file, which changes something like "PL 25,0,201" to "GT:PL 0/1:25,0,201"