Closed roselucia closed 5 years ago
Yes, the most deleterious one is used as the score.
On Fri, Sep 13, 2019 at 7:58 AM roselucia notifications@github.com wrote:
Hello,
in the technical comment in the section Polyphen 2 it si described, that if more than one score exists (due to multiple isoforms) only the largest score (most deleterious) is used in the annotation. How does Annovar deal with multiple scores when annotating the scores of the dataset dbnsfp30a, such as SIFT, MutationTaster, etc.? Is always the most deleterious one/the one with the highes impact used?
Thanks! Best regards, Rose
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/73?email_source=notifications&email_token=ABNG3OBZ2YZHA46UCA4BI6DQJN573A5CNFSM4IWPESH2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLG6SOQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNG3OHH2EG77OFSIG2PNDDQJN573ANCNFSM4IWPESHQ .
Many thanks! May I ask you why for the variant chr17:41244936 G>A Annovar selects the following scores? The Provean score given by the Annovar annotation: 5.84 Multiple Scores are available for the following transcript in dbNSFP3.0 database: ENST00000357654, ENST00000493795, ENST00000471181, ENST00000354071 | 5.74, 5.7, 5.84, 5.71. According to the dbNSFP3.0 readme file the most deleterious score is the smallest one, which then would not be 5.84 but 5.7. Why does Annovar annotate 5.84? So is surely always the most deleterious score used, even though when this would be the smallest score given for a variant?
MutationTaster score given by the Annovar annotation: 0.997 Multiple scores are available for the following transcript in dbNSFP3.0 database: P824L, P871L, P871L, P871L, P575L, P871L 0.996297, 0.996297, 0.996698, 0.996698, 0.996297, 0.996297 Is always the greatest p-value annotated by Annovar?
Thanks again for your help!
I checked original code. The largest rankscore, or converted_score, or converted_rankscore are used to represent multiple scores. The original raw scores are not used. These scores are supposed to be more deleterious, if scores are higher. They are different from raw scores.
You did not give details, so I assume that you are referring to version 3.5a. In the header of 3.5a, there are PROVEAN_score PROVEAN_converted_rankscore PROVEAN_pred fields for dbNSFP, and the PROVEAN_converted_rankscore field is the one that is used in the sorting procedure to select largest score.
On Wed, Oct 2, 2019 at 9:59 AM roselucia notifications@github.com wrote:
Many thanks! May I ask you why for the variant chr17:41244936 G>A Annovar selects the following scores? The Provean score given by the Annovar annotation: 5.84 Multiple Scores are available for the following transcript in dbNSFP3.0 database: ENST00000357654, ENST00000493795, ENST00000471181, ENST00000354071 | 5.74, 5.7, 5.84, 5.71. According to the dbNSFP3.0 readme file the most deleterious score is the smallest one, which then would not be 5.84 but 5.7. Why does Annovar annotate 5.84? So is surely always the most deleterious score used, even though when this would be the smallest score given for a variant?
MutationTaster score given by the Annovar annotation: 0.997 Multiple scores are available for the following transcript in dbNSFP3.0 database: P824L, P871L, P871L, P871L, P575L, P871L 0.996297, 0.996297, 0.996698, 0.996698, 0.996297, 0.996297 Is always the greatest p-value annotated by Annovar?
Thanks again for your help!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/73?email_source=notifications&email_token=ABNG3OFFQKXM4N7W5VIS2PDQMSSKJA5CNFSM4IWPESH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAE24AY#issuecomment-537505283, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNG3OG2JOOXFONEIFCX263QMSSKJANCNFSM4IWPESHQ .
Thanks for you help and sorry for the missing required information. I used the 3.0a version. However I am only left with the headers PROVEAN_score and PROVEAN_Pred. The header PROVEAN_converted_rankscore is not shown in my annotated data. Furthermore I was wondering if always the greatest p-value of the MutationTaster Score is annotated by Annovar?
1)command line argument: I tried two slightly different commands. However as expected, the result was the same
first command: perl table_annovar.pl /Users/rosefroehlich/Desktop/vcf-original/QIAGEN/smcounter2/smCounter2_neu/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.vcf humandb/ -buildver hg19 -out '/Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno' -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp147,dbnsfp30a,cosmic70,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish
second command: perl table_annovar.pl /Users/rosefroehlich/Desktop/vcf-original/QIAGEN/smcounter2/smCounter2_neu/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.vcf humandb/ -buildver hg19 -out /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp147,dbnsfp30a,cosmic70,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish
2)error message in screen/ 3) example input file I sent you an email with a copy of the screen of my second command as well as an example input file attached.
4)Mac (We also tried it on Ubuntu and the result was the same)
Thanks for your help.
I strongly suggest that you use the 3.5 version or 3.3 version because that's the date when the rankscores are introduced so that the most deleterious score for a variant is annotated.
I cannot see any screen message. I do not see any problem in your command line though.
Dear Mr. Wang,
1) Do I understand you correctly that before introducing the rankscores (starting in version dbNSFP 3.3.), the highest score was chosen (e.g. for "PROVEAN_score")? 2) I annotated the GnomAD population frequencies using the v211 database. While comparing the frequencies annotated for two variants with the ones supplied on the GnomAD website, I found that the non_cancer_AF_popmax never includes a genomic frequency for the annotated. Do you have an idea what my mistake is?
The screen messages as well as an example input file I sent you via email, as I cannot attach the input file. If you still need it in order to be able to answer the questions and didn't get it, please let me know.
Below you find a copy of the screen of my second command.
Thanks for your help. Rose
„Last login: Fri Oct 4 16:15:45 on ttys001 Rose-Frohlichs-MacBook-Pro:~ rosefroehlich$ cd /Users/rosefroehlich/Desktop/ANNOVAR/annovar Rose-Frohlichs-MacBook-Pro:annovar rosefroehlich$ perl table_annovar.pl /Users/rosefroehlich/Desktop/vcf-original/QIAGEN/smcounter2/smCounter2_neu/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.vcf humandb/ -buildver hg19 -out /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno -remove -protocol refGene,ensGene,cytoBand,exac03,gnomad211_genome,gnomad211_exome,1000g2015aug_all,1000g2015aug_eur,avsnp147,dbnsfp30a,cosmic70,clinvar_20190305 -operation g,g,r,f,f,f,f,f,f,f,f,f -nastring . -vcfinput -polish
NOTICE: Running with system command <convert2annovar.pl -includeinfo -allsample -withfreq -format vcf4 /Users/rosefroehlich/Desktop/vcf-original/QIAGEN/smcounter2/smCounter2_neu/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.vcf > /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.avinput> NOTICE: Finished reading 380 lines from VCF file NOTICE: A total of 282 locus in VCF file passed QC threshold, representing 263 SNPs (188 transitions and 75 transversions) and 19 indels/substitutions NOTICE: Finished writing allele frequencies based on 263 SNP genotypes (188 transitions and 75 transversions) and 19 indels/substitutions for 1 samples
NOTICE: Processing operation=g protocol=refGene
NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.refGene -exonsort /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.avinput humandb/> NOTICE: Output files were written to /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.refGene.variant_function, /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.refGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_refGene.txt ... Done with 72212 transcripts (including 17527 without coding sequence annotation) for 28250 unique genes NOTICE: Processing next batch with 282 unique variants in 282 input lines NOTICE: Reading FASTA sequences from humandb/hg19_refGeneMrna.fa ... Done with 242 sequences WARNING: A total of 446 sequences will be ignored due to lack of correct ORF annotation
NOTICE: Processing operation=g protocol=ensGene
NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype ensGene -outfile /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.ensGene -exonsort /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.avinput humandb/> NOTICE: Output files were written to /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.ensGene.variant_function, /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.ensGene.exonic_variant_function NOTICE: Reading gene annotation from humandb/hg19_ensGene.txt ... Done with 196501 transcripts (including 101155 without coding sequence annotation) for 57905 unique genes NOTICE: Processing next batch with 282 unique variants in 282 input lines NOTICE: Reading FASTA sequences from humandb/hg19_ensGeneMrna.fa ... Done with 343 sequences WARNING: A total of 6780 sequences will be ignored due to lack of correct ORF annotation
NOTICE: Processing operation=r protocol=cytoBand
NOTICE: Processing operation=f protocol=exac03 NOTICE: Finished reading 8 column headers for '-dbtype exac03'
NOTICE: Processing operation=f protocol=gnomad211_genome NOTICE: Finished reading 17 column headers for '-dbtype gnomad211_genome'
NOTICE: Processing operation=f protocol=gnomad211_exome NOTICE: Finished reading 17 column headers for '-dbtype gnomad211_exome'
NOTICE: Processing operation=f protocol=1000g2015aug_all
NOTICE: Processing operation=f protocol=1000g2015aug_eur
NOTICE: Processing operation=f protocol=avsnp147
NOTICE: Processing operation=f protocol=dbnsfp30a NOTICE: Finished reading 34 column headers for '-dbtype dbnsfp30a'
NOTICE: Processing operation=f protocol=cosmic70
NOTICE: Processing operation=f protocol=clinvar_20190305 NOTICE: Finished reading 5 column headers for '-dbtype clinvar_20190305'
NOTICE: VCF output is written to /Users/rosefroehlich/Desktop/ANNOVAR/Annotierten_Dateien/QIAGEN/QIAseq-DNA-smCounter2.5BC_S15.smCounter.anno.hg19_multianno.hg19_multianno.vcf Rose-Frohlichs-MacBook-Pro:annovar rosefroehlich$
Hello,
in the technical comment in the section Polyphen 2 it si described, that if more than one score exists (due to multiple isoforms) only the largest score (most deleterious) is used in the annotation. How does Annovar deal with multiple scores when annotating the scores of the dataset dbnsfp30a, such as SIFT, MutationTaster, etc.? Is always the most deleterious one/the one with the highes impact used?
Thanks! Best regards, Rose