WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
218 stars 331 forks source link

Cosmic Annotation is not parsed correctly #83

Closed roselucia closed 4 years ago

roselucia commented 4 years ago

Dear Kai,

I used your new Annovar Version successfully. I am facing a little parsing bug however. The Cosmic Annotation is not getting parsed correctly I think "cosmic90_coding=ID\x3dCOSV63870864\x3bOCCURENCE\x3d2(haematopoietic_and_lymphoid_tissue),1(large_intestine)"

(1) command line argument perl table_annovar.pl /Users/rosefroehlich/Desktop/TST170_SnpEffAnnotation/TST170_32a_SnpEffAnnotation.vcf humandb/ -buildver hg19 -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp150,dbnsfp35a,cosmic90_coding,cosmic90_noncoding,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish

(2) Content of Terminal Window: $ cd /Users/rosefroehlich/Desktop/Annovar_Safari_Download/annovar Rose-Frohlichs-MacBook-Pro:annovar rosefroehlich$ perl table_annovar.pl /Users/rosefroehlich/Desktop/TST170_SnpEffAnnotation/TST170_32a_SnpEffAnnotation.vcf humandb/ -buildver hg19 -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp150,dbnsfp35a,cosmic90_coding,cosmic90_noncoding,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish

NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 /Users/rosefroehlich/Desktop/TST170_SnpEffAnnotation/TST170_32a_SnpEffAnnotation.vcf > /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput> NOTICE: Finished reading 4858 lines from VCF file NOTICE: A total of 4798 locus in VCF file passed QC threshold, representing 4192 SNPs (2019 transitions and 2173 transversions) and 606 indels/substitutions NOTICE: Finished writing allele frequencies based on 4192 SNP genotypes (2019 transitions and 2173 transversions) and 606 indels/substitutions for 1 samples

NOTICE: Running with system command <table_annovar.pl /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp150,dbnsfp35a,cosmic90_coding,cosmic90_noncoding,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f,f -nastring . -polish -otherinfo>

NOTICE: Processing operation=g protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene -exonsort -nofirstcodondel /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output files are written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.variant_function, /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 72212 transcripts (including 17527 without coding sequence annotation) for 28250 unique genes NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Reading FASTA sequences from humandb/hg19_refGeneMrna.fa ... Done with 455 sequences WARNING: A total of 446 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.exonic_variant_function.orig humandb//hg19_refGene.txt humandb//hg19_refGeneMrna.fa -alltranscript -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.fa -newevf /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.exonic_variant_function>

NOTICE: Processing operation=g protocol=ensGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype ensGene -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene -exonsort -nofirstcodondel /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output files are written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.variant_function, /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_ensGene.txt ... Done with 196501 transcripts (including 101155 without coding sequence annotation) for 57905 unique genes NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Reading FASTA sequences from humandb/hg19_ensGeneMrna.fa ... Done with 586 sequences WARNING: A total of 6780 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.exonic_variant_function.orig humandb//hg19_ensGene.txt humandb//hg19_ensGeneMrna.fa -alltranscript -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.fa -newevf /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.exonic_variant_function>

NOTICE: Processing operation=r protocol=cytoBand

NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype cytoBand -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cytoBand NOTICE: Reading annotation database humandb/hg19_cytoBand.txt ... Done with 862 regions NOTICE: Finished region-based annotation on 4798 genetic variants

NOTICE: Processing operation=f protocol=exac03 NOTICE: Finished reading 8 column headers for '-dbtype exac03'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype exac03 -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_exac03_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_exac03_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 749886 and the number of bins to be scanned is 472 NOTICE: Scanning filter database humandb/hg19_exac03.txt...Done

NOTICE: Processing operation=f protocol=gnomad211_genome NOTICE: Finished reading 17 column headers for '-dbtype gnomad211_genome'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype gnomad211_genome -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_genome_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_genome_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 28119985 and the number of bins to be scanned is 917 NOTICE: Scanning filter database humandb/hg19_gnomad211_genome.txt...Done

NOTICE: Processing operation=f protocol=gnomad211_exome NOTICE: Finished reading 17 column headers for '-dbtype gnomad211_exome'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype gnomad211_exome -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_exome_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_exome_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 773145 and the number of bins to be scanned is 474 NOTICE: Scanning filter database humandb/hg19_gnomad211_exome.txt...Done

NOTICE: Processing operation=f protocol=1000g2015aug_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_all -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_ALL.sites.2015_08_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_ALL.sites.2015_08_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 2824642 and the number of bins to be scanned is 669 NOTICE: Scanning filter database humandb/hg19_ALL.sites.2015_08.txt...Done

NOTICE: Processing operation=f protocol=1000g2015aug_eur

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_eur -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_EUR.sites.2015_08_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_EUR.sites.2015_08_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 2812033 and the number of bins to be scanned is 668 NOTICE: Scanning filter database humandb/hg19_EUR.sites.2015_08.txt...Done

NOTICE: Processing operation=f protocol=avsnp150

NOTICE: Running system command <annotate_variation.pl -filter -dbtype avsnp150 -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_avsnp150_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_avsnp150_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 28258790 and the number of bins to be scanned is 917 NOTICE: Scanning filter database humandb/hg19_avsnp150.txt...Done

NOTICE: Processing operation=f protocol=dbnsfp35a NOTICE: Finished reading 70 column headers for '-dbtype dbnsfp35a'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp35a -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_dbnsfp35a_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_dbnsfp35a_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 550512 and the number of bins to be scanned is 456 NOTICE: Scanning filter database humandb/hg19_dbnsfp35a.txt...Done

NOTICE: Processing operation=f protocol=cosmic90_coding

NOTICE: Running system command <annotate_variation.pl -filter -dbtype cosmic90_coding -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: the --dbtype cosmic90_coding is assumed to be in generic ANNOVAR database format NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_coding_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_coding_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Scanning filter database humandb/hg19_cosmic90_coding.txt...Done

NOTICE: Processing operation=f protocol=cosmic90_noncoding

NOTICE: Running system command <annotate_variation.pl -filter -dbtype cosmic90_noncoding -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: the --dbtype cosmic90_noncoding is assumed to be in generic ANNOVAR database format NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_noncoding_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_noncoding_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Scanning filter database humandb/hg19_cosmic90_noncoding.txt...Done

NOTICE: Processing operation=f protocol=clinvar_20190305 NOTICE: Finished reading 5 column headers for '-dbtype clinvar_20190305'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype clinvar_20190305 -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: the --dbtype clinvar_20190305 is assumed to be in generic ANNOVAR database format NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_clinvar_20190305_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_clinvar_20190305_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 45822 and the number of bins to be scanned is 317 NOTICE: Scanning filter database humandb/hg19_clinvar_20190305.txt...Done

NOTICE: Multianno output file is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_multianno.txt NOTICE: Reading from /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_multianno.txt

NOTICE: VCF output is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_multianno.vcf

(4) I use a MacBook Pro (13 inch, Early 2011), 2.3 GHz Intel Core i5, 8 GB 1333 MHz DDR3, Samsung SSD 840 EVO, Intel HD Graphics 3000 512 MB, C02FG0ENDH2L, macOS High Sierra 10.13.6)

Thanks a lot for your help.

All the best, Rose

hsiaoyi0504 commented 4 years ago

I thought this has been previously discussed: https://github.com/WGLab/doc-ANNOVAR/issues/41. Check this one as well: https://www.biostars.org/p/266798/

roselucia commented 4 years ago

Yes, Thank you.