WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
227 stars 347 forks source link

Died at /work/Software/Download/Variant_Package/annovar/coding_change.pl line 553, <FASTA> line 149454. #34

Closed haiwufan closed 6 years ago

haiwufan commented 6 years ago

Hi Developer! I test the latest annovar, and get a error.

log record as below:

$ table_annovar.pl $Sample.combined.vcf $Anno_db --vcfinput -buildver hg38 -out $Sample --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring .

NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 N0202G2.combined.vcf > N0202G2.avinput> NOTICE: Finished reading 365921 lines from VCF file NOTICE: A total of 362523 locus in VCF file passed QC threshold, representing 346476 SNPs (239967 transitions and 106509 transversions) and 16547 indels/substitutions NOTICE: Finished writing allele frequencies based on 346476 SNP genotypes (239967 transitions and 106509 transversions) and 16547 indels/substitutions for 1 samples

NOTICE: Running with system command </work/Software/Download/Variant_Package/annovar/table_annovar.pl N0202G2.avinput /work/Database/Annovar_db/hg38_20180130 -buildver hg38 -outfile N0202G2 --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring . -otherinfo>

NOTICE: Processing operation=r protocol=cytoBand

NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype cytoBand -buildver hg38 -outfile N0202G2 N0202G2.avinput /work/Database/Annovar_db/hg38_20180130> NOTICE: Output file is written to N0202G2.hg38_cytoBand NOTICE: Reading annotation database /work/Database/Annovar_db/hg38_20180130/hg38_cytoBand.txt ... Done with 1293 regions NOTICE: Finished region-based annotation on 363002 genetic variants NOTICE: Variants with invalid input format were written to N0202G2.invalid_input

NOTICE: Processing operation=g protocol=refGeneWithVer

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype refGeneWithVer -outfile N0202G2.refGeneWithVer -exonsort N0202G2.avinput /work/Database/Annovar_db/hg38_20180130> NOTICE: Output files were written to N0202G2.refGeneWithVer.variant_function, N0202G2.refGeneWithVer.exonic_variant_function NOTICE: Reading gene annotation from /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt ... Done with 74727 transcripts (including 18443 without coding sequence annotation) for 28059 unique genes NOTICE: Processing next batch with 363002 unique variants in 363002 input lines NOTICE: Reading FASTA sequences from /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa ... Done with 21138 sequences WARNING: A total of 526 sequences will be ignored due to lack of correct ORF annotation NOTICE: Variants with invalid input format were written to N0202G2.refGeneWithVer.invalid_input

NOTICE: Running with system command <coding_change.pl N0202G2.refGeneWithVer.exonic_variant_function.orig /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa -alltranscript -out N0202G2.refGeneWithVer.fa -newevf N0202G2.refGeneWithVer.exonic_variant_function> Died at /work/Software/Download/Variant_Package/annovar/coding_change.pl line 553, line 149454. Error running system command: <coding_change.pl N0202G2.refGeneWithVer.exonic_variant_function.orig /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa -alltranscript -out N0202G2.refGeneWithVer.fa -newevf N0202G2.refGeneWithVer.exonic_variant_function> Error running system command: </work/Software/Download/Variant_Package/annovar/table_annovar.pl N0202G2.avinput /work/Database/Annovar_db/hg38_20180130 -buildver hg38 -outfile N0202G2 --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring . -otherinfo>

And then I check temp file $Sample.refGeneWithVer.fa file and found :

$ tail $Sample.refGeneWithVer.fa LNLGIFASRLYYHWCKPQQKGLRLLCGSQVPVEVMGFPEFADCWENFVDHEKPLSFNPYKMLEELDKNSRAIKRRLERIKQS*

line343937 NM_004711.4 WILDTYPE MEGGAYGAGKAGGAFDPYTLVRQPHTILRVVSWLFSIVVFGSIVNEGYLNSASEGEEFCIYNRNPNACSYGVAVGVLAFLTCLLYLALDVYFPQISSVKD RKKAVLSDIGVSAFWAFLWFVGFCYLANQWQVSKPKDNPLNEGTDAARAAIAFSFFSIFTWAGQAVLAFQRYQIGADSALFSQDYMDPSQDSSMPYAPYV EPTGPDPAGMGGTYQQPANTFDTEPQGYQSQGY line343937 NM_004711.4 c.605_606insCAA p.P202_T203insN protein-altering (position 202-203 has insertion N) MEGGAYGAGKAGGAFDPYTLVRQPHTILRVVSWLFSIVVFGSIVNEGYLNSASEGEEFCIYNRNPNACSYGVAVGVLAFLTCLLYLALDVYFPQISSVKD RKKAVLSDIGVSAFWAFLWFVGFCYLANQWQVSKPKDNPLNEGTDAARAAIAFSFFSIFTWAGQAVLAFQRYQIGADSALFSQDYMDPSQDSSMPYAPYV EPNTGPDPAGMGGTYQQPANTFDTEPQGYQSQGY WARNING: invalid triplets found in DNA sequence to be translated: in

Then I get line343937 info from $Sample.refGeneWithVer.exonic_variant_function.orig file, but don't not found some problem.

$ grep "NM_004711" $Sample.refGeneWithVer.exonic_variant_function.orig line343937 nonframeshift insertion SYNGR1:NM_004711.4:exon4:c.605_606insCAA:p.P202delinsPN chr22 39381817 39381817 - CAA 1 9966.73 223 chr22 39381817 rs149306472 C CCAA 9966.73 PASS AC=2;AF=1.00;AN=2;DB;DP=230;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.000;MQ=60.00;QD=44.69;SOR=1.179;set=variant2 GT:AD:DP:GQ:PL 1/1:0,223:223:99:10004,673,0

haiwufan commented 6 years ago

$ tail $Sample.refGeneWithVer.fa

default

kaichop commented 6 years ago

Do you mean you get the problem with the following variant:

chr22 39381817 39381817 - CAA

On Thu, Jun 7, 2018 at 9:50 PM, haiwufan notifications@github.com wrote:

Hi Developer! I test the latest annovar, and get a error.

log record as below:

$ table_annovar.pl http://table_annovar.pl $Sample.combined.vcf $Anno_db --vcfinput -buildver hg38 -out $Sample --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring .

NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 N0202G2.combined.vcf > N0202G2.avinput> NOTICE: Finished reading 365921 lines from VCF file NOTICE: A total of 362523 locus in VCF file passed QC threshold, representing 346476 SNPs (239967 transitions and 106509 transversions) and 16547 indels/substitutions NOTICE: Finished writing allele frequencies based on 346476 SNP genotypes (239967 transitions and 106509 transversions) and 16547 indels/substitutions for 1 samples NOTICE: Running with system command </work/Software/Download/ Variant_Package/annovar/table_annovar.pl N0202G2.avinput /work/Database/Annovar_db/hg38_20180130 -buildver hg38 -outfile N0202G2 --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring . -otherinfo>

NOTICE: Processing operation=r protocol=cytoBand NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype cytoBand -buildver hg38 -outfile N0202G2 N0202G2.avinput /work/Database/Annovar_db/hg38_20180130> NOTICE: Output file is written to N0202G2.hg38_cytoBand NOTICE: Reading annotation database /work/Database/Annovar_db/ hg38_20180130/hg38_cytoBand.txt ... Done with 1293 regions NOTICE: Finished region-based annotation on 363002 genetic variants NOTICE: Variants with invalid input format were written to N0202G2.invalid_input

NOTICE: Processing operation=g protocol=refGeneWithVer

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype refGeneWithVer -outfile N0202G2.refGeneWithVer -exonsort N0202G2.avinput /work/Database/Annovar_db/hg38_20180130> NOTICE: Output files were written to N0202G2.refGeneWithVer.variant_function, N0202G2.refGeneWithVer.exonic_variant_function NOTICE: Reading gene annotation from /work/Database/Annovar_db/ hg38_20180130/hg38_refGeneWithVer.txt ... Done with 74727 transcripts (including 18443 without coding sequence annotation) for 28059 unique genes NOTICE: Processing next batch with 363002 unique variants in 363002 input lines NOTICE: Reading FASTA sequences from /work/Database/Annovar_db/ hg38_20180130/hg38_refGeneWithVerMrna.fa ... Done with 21138 sequences WARNING: A total of 526 sequences will be ignored due to lack of correct ORF annotation NOTICE: Variants with invalid input format were written to N0202G2.refGeneWithVer.invalid_input

NOTICE: Running with system command <coding_change.pl N0202G2.refGeneWithVer.exonic_variant_function.orig /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa -alltranscript -out N0202G2.refGeneWithVer.fa -newevf N0202G2.refGeneWithVer.exonic_variant_function> Died at /work/Software/Download/Variant_Package/annovar/coding_change.pl line 553, line 149454. Error running system command: <coding_change.pl N0202G2.refGeneWithVer.exonic_variant_function.orig /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVer.txt /work/Database/Annovar_db/hg38_20180130/hg38_refGeneWithVerMrna.fa -alltranscript -out N0202G2.refGeneWithVer.fa -newevf N0202G2.refGeneWithVer.exonic_variant_function> Error running system command: </work/Software/Download/ Variant_Package/annovar/table_annovar.pl N0202G2.avinput /work/Database/Annovar_db/hg38_20180130 -buildver hg38 -outfile N0202G2 --checkfile --otherinfo -remove -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring . -otherinfo>

And then I check temp file $Sample.refGeneWithVer.fa file and found :

$ tail $Sample.refGeneWithVer.fa LNLGIFASRLYYHWCKPQQKGLRLLCGSQVPVEVMGFPEFADCWENFVDHEKPLSFNPYK MLEELDKNSRAIKRRLERIKQS*

line343937 NM_004711.4 WILDTYPE MEGGAYGAGKAGGAFDPYTLVRQPHTILRVVSWLFSIVVFGSIVNEGYLNSASEGEEFCI YNRNPNACSYGVAVGVLAFLTCLLYLALDVYFPQISSVKD RKKAVLSDIGVSAFWAFLWFVGFCYLANQWQVSKPKDNPLNEGTDAARAAIAFSFFSIFT WAGQAVLAFQRYQIGADSALFSQDYMDPSQDSSMPYAPYV EPTGPDPAGMGGTYQQPANTFDTEPQGYQSQGY line343937 NM_004711.4 c.605_606insCAA p.P202_T203insN protein-altering (position 202-203 has insertion N) MEGGAYGAGKAGGAFDPYTLVRQPHTILRVVSWLFSIVVFGSIVNEGYLNSASEGEEFCI YNRNPNACSYGVAVGVLAFLTCLLYLALDVYFPQISSVKD RKKAVLSDIGVSAFWAFLWFVGFCYLANQWQVSKPKDNPLNEGTDAARAAIAFSFFSIFT WAGQAVLAFQRYQIGADSALFSQDYMDPSQDSSMPYAPYV EPNTGPDPAGMGGTYQQPANTFDTEPQGYQSQGY WARNING: invalid triplets found in DNA sequence to be translated: in

Then I get line343937 info from $Sample.refGeneWithVer.exonic_variant_function.orig file, but don't not found some problem.

$ grep "NM_004711" $Sample.refGeneWithVer.exonic_variant_function.orig line343937 nonframeshift insertion SYNGR1:NM_004711.4:exon4:c.605_606insCAA:p.P202delinsPN chr22 39381817 39381817 - CAA 1 9966.73 223 chr22 39381817 rs149306472 C CCAA 9966.73 PASS AC=2;AF=1.00;AN=2;DB;DP=230;ExcessHet=3.0103;FS=0.000; MLEAC=2;MLEAF=1.000;MQ=60.00;QD=44.69;SOR=1.179;set=variant2 GT:AD:DP:GQ:PL 1/1:0,223:223:99:10004,673,0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/34, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuHa3hPa1HmudMjqelgAnrdzcD6qXks5t6dhRgaJpZM4UfXpz .

haiwufan commented 6 years ago

Hi Kaichop I annotate the "chr22 39381817 39381817 - CAA" variant, It's ok. Then I split avinput by chromosome, and annotate separately by Annovar. At last I confirm this error happend on chrX chromosome, but still can't determine which position variant.

I upload my chrX avinput file, temporary file and logs.

$ /work/Software/annovar_2018-04-16/table_annovar.pl ../chrX.avinput $Anno_db -buildver hg38 -out chrX --checkfile --otherinfo -polish -protocol cytoBand,refGeneWithVer,ensGene,knownGene -operation r,g,g,g -nastring -

Bug_Report.zip

haiwufan commented 6 years ago

I found "chrX 1403271 1403271 A - " variant will cause this error.

My hg38_refGeneWithVer.txt info contain this position: hg38_refGeneWithVer.xlsx

kaichop commented 6 years ago

I cannot reproduce the problem. My command and output is below. Where did you get the WithVer file? That must be the wrong file. Please read FAQ #1 and provide all necessary information.

table_annovar.pl temp project/annotate_variation/humandb/ -buildver hg38 --checkfile --otherinfo -remove -polish -protocol refGene -operation g -nastring .

chrX 1403271 1403271 A - exonic ASMTL . stoploss

ASMTL:NM_001173474:exon12:c.1816delT:p.606delinsEAQAACSL,ASMTL:NM_001173473:exon13:c.1690delT:p.564delinsEAQAACSL,ASMTL:NM_004192:exon13:c.1864delT:p.622delinsEAQAACSL

On Fri, Jun 8, 2018 at 3:42 AM, haiwufan notifications@github.com wrote:

I found "chrX 1403271 1403271 A - " variant will cause this error.

My hg38_refGeneWithVer.txt info contain this position: hg38_refGeneWithVer.xlsx https://github.com/WGLab/doc-ANNOVAR/files/2083388/hg38_refGeneWithVer.xlsx

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/34#issuecomment-395677442, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuMzQ0YSJCc3On91eNEpEpmo5_Wypks5t6ir0gaJpZM4UfXpz .

haiwufan commented 6 years ago

Thanks kaichop,

I have found the reason. When I build hg38_refGeneMrna.fa from hg38_refGene.txt, I choose hg38 refgenome fasta file from GATK bundle database. But this file mask some region of chromosome with 'N'. So I got the error hg38_refGeneMrna.fa. Then, I rebuild hg38_refGeneMrna.fa, and choose refgenome fasta from UCSC. I solved this problem.

Thanks again!