WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
225 stars 347 forks source link

-xref annotation issues with new version #19

Closed rbutleriii closed 7 years ago

rbutleriii commented 7 years ago

Hello,

I am trying to run table_annovar.pl with the new xref and polish options and am running into a problem. Update it appears to be a polish problem, without that switch it runs without issue.

table_annovar.pl avinput.temp /ghi/butlerr/opt/annovar/humandb/ -buildver hg19 -out rslist -remove -protocol refGene,avsnp147,dbnsfp33a,exac03,gnomad_genome,intervar_20170202 -operation gx,f,f,f,f,f -nastring "-" -polish -xref /ghi/butlerr/opt/annovar/example/gene_fullxref.txt
-----------------------------------------------------------------
NOTICE: Processing operation=gx protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile rslist.refGene -exonsort avinput.temp /ghi/butlerr/opt/annovar/humandb/>
NOTICE: Output files were written to rslist.refGene.

variant_function, rslist.refGene.exonic_variant_function
NOTICE: Reading gene annotation from /ghi/butlerr/opt/annovar/humandb/hg19_refGene.txt ... Done with 63481 transcripts (including 15216 without coding sequence annotation) for 27720 unique genes
NOTICE: Processing next batch with 377 unique variants in 377 input lines
NOTICE: Reading FASTA sequences from /ghi/butlerr/opt/annovar/humandb/hg19_refGeneMrna.fa ... Done with 22 sequences
WARNING: A total of 405 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl rslist.refGene.exonic_variant_function.orig /ghi/butlerr/opt/annovar/humandb//hg19_refGene.txt /ghi/butlerr/opt/annovar/humandb//hg19_refGeneMrna.fa -alltranscript -out rslist.refGene.fa -newevf rslist.refGene.exonic_variant_function>
Error: invalid record found in exonic_variant_function file (exonic format error): <line2       frameshift substitution CFTR:NM_000492:exon1:c.-13_10G  7       117120135 117120158       GCGCCCGAGAGACCATGCAGAGGT        G       rs397508136> at /ghi/butlerr/opt/annovar/coding_change.pl line 51, <EVF> line 2.
Error running system command: <coding_change.pl rslist.refGene.exonic_variant_function.orig /ghi/butlerr/opt/annovar/humandb//hg19_refGene.txt /ghi/butlerr/opt/annovar/humandb//hg19_refGeneMrna.fa -alltranscript -out rslist.refGene.fa -newevf rslist.refGene.exonic_variant_function>

It can run with other avinput files, just not this one (the second line seems to be the issue). the body of the file was generated from avsnp147 lines (below):

3 15676984 15676990 GCGGCTG TCC rs80338684 7 117120135 117120158 GCGCCCGAGAGACCATGCAGAGGT G rs397508136 7 117120136 117120158 CGCCCGAGAGACCATGCAGAGGT - rs397508136 7 117120149 117120149 A G rs397508328 7 117120159 117120159 C A rs397508173 7 117120159 117120159 C T rs397508173 7 117120191 117120192 CT C rs397508742 7 117120192 117120192 T - rs397508742 7 117120202 117120202 G T rs397508746 7 117144332 117144332 G A rs397508796 7 117144332 117144332 G C rs397508796 7 117144332 117144332 G T rs397508796 7 117144368 117144368 C T rs397508168 7 117144390 117144390 C A rs151020603 7 117144390 117144390 C T rs151020603 7 117144418 117144418 G A rs397508243 7 117144418 117144418 G C rs397508243 7 117144418 117144418 G T rs397508243 7 117149087 117149087 G A rs397508249 7 117149089 117149089 G A rs397508256 7 117149093 117149093 G A rs397508279 7 117149094 117149094 G A rs121909025 7 117149097 117149097 - A rs397508294 7 117149097 117149097 T TA rs397508294 7 117149101 117149101 G A rs77284892 7 117149101 117149101 G T rs77284892 7 117149123 117149123 C T rs368505753 7 117149146 117149146 C T rs121908749 7 117149150 117149150 G GT rs397508360 7 117149150 117149150 - T rs397508360

kaichop commented 7 years ago

I will check this and fix the issue. -Kai

On Thu, Jun 29, 2017 at 4:58 PM, Robert Butler notifications@github.com wrote:

Hello,

I am trying to run table_annovar.pl with the new xref and polish options and am running into a problem.

table_annovar.pl avinput.temp /ghi/butlerr/opt/annovar/humandb/ -buildver hg19 -out rslist -remove -protocol refGene,avsnp147,dbnsfp33a,exac03,gnomad_genome,intervar_20170202 -operation gx,f,f,f,f,f -nastring "-" -polish -xref /ghi/butlerr/opt/annovar/example/gene_fullxref.txt

NOTICE: Processing operation=gx protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile rslist.refGene -exonsort avinput.temp /ghi/butlerr/opt/annovar/humandb/> NOTICE: Output files were written to rslist.refGene.

variant_function, rslist.refGene.exonic_variant_function NOTICE: Reading gene annotation from /ghi/butlerr/opt/annovar/humandb/hg19_refGene.txt ... Done with 63481 transcripts (including 15216 without coding sequence annotation) for 27720 unique genes NOTICE: Processing next batch with 377 unique variants in 377 input lines NOTICE: Reading FASTA sequences from /ghi/butlerr/opt/annovar/humandb/hg19_refGeneMrna.fa ... Done with 22 sequences WARNING: A total of 405 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl rslist.refGene.exonic_variant_function.orig /ghi/butlerr/opt/annovar/humandb//hg19_refGene.txt /ghi/butlerr/opt/annovar/humandb//hg19_refGeneMrna.fa -alltranscript -out rslist.refGene.fa -newevf rslist.refGene.exonic_variant_function> Error: invalid record found in exonic_variant_function file (exonic format error): <line2 frameshift substitution CFTR:NM_000492:exon1:c.-13_10G 7 117120135 117120158 GCGCCCGAGAGACCATGCAGAGGT G rs397508136> at /ghi/butlerr/opt/annovar/coding_change.pl line 51, line 2. Error running system command: <coding_change.pl rslist.refGene.exonic_variant_function.orig /ghi/butlerr/opt/annovar/humandb//hg19_refGene.txt /ghi/butlerr/opt/annovar/humandb//hg19_refGeneMrna.fa -alltranscript -out rslist.refGene.fa -newevf rslist.refGene.exonic_variant_function>

It can run with other avinput files, just not this one (the second line seems to be the issue). the body of the file was generated from avsnp147 lines (below):

3 15676984 15676990 GCGGCTG TCC rs80338684 7 117120135 117120158 GCGCCCGAGAGACCATGCAGAGGT G rs397508136 7 117120136 117120158 CGCCCGAGAGACCATGCAGAGGT - rs397508136 7 117120149 117120149 A G rs397508328 7 117120159 117120159 C A rs397508173 7 117120159 117120159 C T rs397508173 7 117120191 117120192 CT C rs397508742 7 117120192 117120192 T - rs397508742 7 117120202 117120202 G T rs397508746 7 117144332 117144332 G A rs397508796 7 117144332 117144332 G C rs397508796 7 117144332 117144332 G T rs397508796 7 117144368 117144368 C T rs397508168 7 117144390 117144390 C A rs151020603 7 117144390 117144390 C T rs151020603 7 117144418 117144418 G A rs397508243 7 117144418 117144418 G C rs397508243 7 117144418 117144418 G T rs397508243 7 117149087 117149087 G A rs397508249 7 117149089 117149089 G A rs397508256 7 117149093 117149093 G A rs397508279 7 117149094 117149094 G A rs121909025 7 117149097 117149097 - A rs397508294 7 117149097 117149097 T TA rs397508294 7 117149101 117149101 G A rs77284892 7 117149101 117149101 G T rs77284892 7 117149123 117149123 C T rs368505753 7 117149146 117149146 C T rs121908749 7 117149150 117149150 G GT rs397508360 7 117149150 117149150 - T rs397508360

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuEcicEtbsay_xd0mEFLQYRtJqdGDks5sJA_9gaJpZM4OJ2-s .

kaichop commented 7 years ago

This is fixed now.