WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

Which score is chosen when multiple prediction score are annotated with dbNSFP3.5? #84

Open roselucia opened 4 years ago

roselucia commented 4 years ago

Hello Kai,

In Issue #73 you clarified my question in regard to which prediction score is chosen, if multiple scores are annotated with dbNSFP3.0 and told me that it is always the highest, rather than the most deleterious. I thus followed your advise and again annotated my data with dbNSFP3.5, as the rank score hadn't been introduced yet in dbNSFP3.0. However, I still have difficulties to understand which prediction score is chosen, if multipel scores are annotated. This affects the following scores: MutationTaster, SIFT, Polyphen, FATHMM HDIV and HVAR, Provean as well as VEST3. The calculation of the converted rank score (e.g.PROVEAN_converted_rankscore) is explained in the readme file of dbNSFP3.5 (https://drive.google.com/file/d/0B60wROKy6OqcNGJ2STJlMTJONk0/view). However, I do not find any information about which prediction and which score is annotated when using annovar (e.g. PROVEAN_pred and PROVEAN_score). May I ask you why for the variant chr17:41244936 G>A Annovar selects the following scores? The Provean score given by the Annovar annotation: 5.74 Multiple Scores are available for the following transcript in dbNSFP3.5 database: ENST00000357654, ENST00000493795, ENST00000471181, ENST00000354071 5.74,5.7,5.84,5.71 According to the dbNSFP3.5 readme file the most deleterious score is the smallest one, which then would not be 5.74 but 5.7. Why does Annovar annotate 5.74?

MutationTaster score given by the Annovar annotation: 0.996 Multiple scores are available in dbNSFP3.5 database: 0.996297,0.996297,0.996698,0.996698,0.996297,0.996297,2.19253e-13,2.19253e-13,2.19253e-13,2.19253e-13,2.19253e-13,2.19253e-13,2.19253e-13,2.19253e-13

How does Annovar select a MutationTaster_score when multiple scores (p-values) are given?

Thanks again for your help!

1)command line argument: perl table_annovar.pl /Users/rosefroehlich/Desktop/TST170_SnpEffAnnotation/TST170_32a_SnpEffAnnotation.vcf humandb/ -buildver hg19 -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp150,dbnsfp35a,cosmic90_coding,cosmic90_noncoding,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish

2)error message in screen Last login: Sun Dec 8 15:16:35 on ttys000 Rose-Frohlichs-MacBook-Pro:~ rosefroehlich$ cd /Users/rosefroehlich/Desktop/Annovar_Safari_Download/annovar Rose-Frohlichs-MacBook-Pro:annovar rosefroehlich$ perl table_annovar.pl /Users/rosefroehlich/Desktop/TST170_SnpEffAnnotation/TST170_32a_SnpEffAnnotation.vcf humandb/ -buildver hg19 -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp150,dbnsfp35a,cosmic90_coding,cosmic90_noncoding,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish

NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 /Users/rosefroehlich/Desktop/TST170_SnpEffAnnotation/TST170_32a_SnpEffAnnotation.vcf > /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput> NOTICE: Finished reading 4853 lines from VCF file NOTICE: A total of 4798 locus in VCF file passed QC threshold, representing 4192 SNPs (2019 transitions and 2173 transversions) and 606 indels/substitutions NOTICE: Finished writing allele frequencies based on 4192 SNP genotypes (2019 transitions and 2173 transversions) and 606 indels/substitutions for 1 samples

NOTICE: Running with system command <table_annovar.pl /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp150,dbnsfp35a,cosmic90_coding,cosmic90_noncoding,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f,f -nastring . -polish -otherinfo>

NOTICE: Processing operation=g protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene -exonsort -nofirstcodondel /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output files are written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.variant_function, /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 72212 transcripts (including 17527 without coding sequence annotation) for 28250 unique genes NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Reading FASTA sequences from humandb/hg19_refGeneMrna.fa ... Done with 455 sequences WARNING: A total of 446 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.exonic_variant_function.orig humandb//hg19_refGene.txt humandb//hg19_refGeneMrna.fa -alltranscript -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.fa -newevf /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.refGene.exonic_variant_function>

NOTICE: Processing operation=g protocol=ensGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype ensGene -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene -exonsort -nofirstcodondel /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output files are written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.variant_function, /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_ensGene.txt ... Done with 196501 transcripts (including 101155 without coding sequence annotation) for 57905 unique genes NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Reading FASTA sequences from humandb/hg19_ensGeneMrna.fa ... Done with 586 sequences WARNING: A total of 6780 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.exonic_variant_function.orig humandb//hg19_ensGene.txt humandb//hg19_ensGeneMrna.fa -alltranscript -out /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.fa -newevf /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.ensGene.exonic_variant_function>

NOTICE: Processing operation=r protocol=cytoBand

NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype cytoBand -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cytoBand NOTICE: Reading annotation database humandb/hg19_cytoBand.txt ... Done with 862 regions NOTICE: Finished region-based annotation on 4798 genetic variants

NOTICE: Processing operation=f protocol=exac03 NOTICE: Finished reading 8 column headers for '-dbtype exac03'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype exac03 -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_exac03_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_exac03_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 749886 and the number of bins to be scanned is 472 NOTICE: Scanning filter database humandb/hg19_exac03.txt...Done

NOTICE: Processing operation=f protocol=gnomad211_genome NOTICE: Finished reading 17 column headers for '-dbtype gnomad211_genome'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype gnomad211_genome -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_genome_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_genome_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 28119985 and the number of bins to be scanned is 917 NOTICE: Scanning filter database humandb/hg19_gnomad211_genome.txt...Done

NOTICE: Processing operation=f protocol=gnomad211_exome NOTICE: Finished reading 17 column headers for '-dbtype gnomad211_exome'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype gnomad211_exome -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_exome_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_gnomad211_exome_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 773145 and the number of bins to be scanned is 474 NOTICE: Scanning filter database humandb/hg19_gnomad211_exome.txt...Done

NOTICE: Processing operation=f protocol=1000g2015aug_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_all -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_ALL.sites.2015_08_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_ALL.sites.2015_08_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 2824642 and the number of bins to be scanned is 669 NOTICE: Scanning filter database humandb/hg19_ALL.sites.2015_08.txt...Done

NOTICE: Processing operation=f protocol=1000g2015aug_eur

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_eur -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_EUR.sites.2015_08_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_EUR.sites.2015_08_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 2812033 and the number of bins to be scanned is 668 NOTICE: Scanning filter database humandb/hg19_EUR.sites.2015_08.txt...Done

NOTICE: Processing operation=f protocol=avsnp150

NOTICE: Running system command <annotate_variation.pl -filter -dbtype avsnp150 -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_avsnp150_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_avsnp150_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 28258790 and the number of bins to be scanned is 917 NOTICE: Scanning filter database humandb/hg19_avsnp150.txt...Done

NOTICE: Processing operation=f protocol=dbnsfp35a NOTICE: Finished reading 70 column headers for '-dbtype dbnsfp35a'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp35a -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_dbnsfp35a_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_dbnsfp35a_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 550512 and the number of bins to be scanned is 456 NOTICE: Scanning filter database humandb/hg19_dbnsfp35a.txt...Done

NOTICE: Processing operation=f protocol=cosmic90_coding

NOTICE: Running system command <annotate_variation.pl -filter -dbtype cosmic90_coding -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: the --dbtype cosmic90_coding is assumed to be in generic ANNOVAR database format NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_coding_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_coding_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Scanning filter database humandb/hg19_cosmic90_coding.txt...Done

NOTICE: Processing operation=f protocol=cosmic90_noncoding

NOTICE: Running system command <annotate_variation.pl -filter -dbtype cosmic90_noncoding -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/> NOTICE: the --dbtype cosmic90_noncoding is assumed to be in generic ANNOVAR database format NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_noncoding_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_cosmic90_noncoding_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Scanning filter database humandb/hg19_cosmic90_noncoding.txt...Done

NOTICE: Processing operation=f protocol=clinvar_20190305 NOTICE: Finished reading 5 column headers for '-dbtype clinvar_20190305'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype clinvar_20190305 -buildver hg19 -outfile /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.avinput humandb/ -otherinfo> NOTICE: the --dbtype clinvar_20190305 is assumed to be in generic ANNOVAR database format NOTICE: Output file with variants matching filtering criteria is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_clinvar_20190305_dropped, and output file with other variants is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_clinvar_20190305_filtered NOTICE: Processing next batch with 4798 unique variants in 4798 input lines NOTICE: Database index loaded. Total number of bins is 45822 and the number of bins to be scanned is 317 NOTICE: Scanning filter database humandb/hg19_clinvar_20190305.txt...Done

NOTICE: Multianno output file is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_multianno.txt NOTICE: Reading from /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_multianno.txt

NOTICE: VCF output is written to /Users/rosefroehlich/Desktop/TST170_Multianno/TST170_32a.hg19_multianno.vcf Rose-Frohlichs-MacBook-Pro:annovar rosefroehlich$

3) example input file sent via email

4)Mac

Thanks for your help.