Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
142 stars 117 forks source link

PolyPhen HVAR: VEP v103 and dbNSFP v4.1 discrepancies #410

Closed cccnrc closed 2 years ago

cccnrc commented 3 years ago

Hi all, I noticed that for some variants HVAR values outputted by VEP (through --everything) and dbNSFP v4.1a are different, as example for a file annotated thorugh VEP v103 docker, with command:

./vep -v \
                --cache --offline \
                --assembly GRCh38 \
                --format vcf --vcf \
                --force_overwrite \
                --dir_cache /opt/vep/.vep/ \
                --everything \
                --pick \
                --input_file /vcf_files/$( basename $VCF ) \
                --plugin dbscSNV,/dbSCSNV/dbscSNV1.1_GRCh38.txt.gz \
                --plugin CADD,/CADD/whole_genome_SNVs.tsv.gz,/CADD/gnomad.genomes.r3.0.indel.tsv.gz \
                --custom /clinvar/clinvar.08oct2020.hg38.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN,CLNDISDB \
                --plugin dbNSFP,/dbNSFP/dbNSFP4.1a.hg38.gz,LRT_score,LRT_pred,GERP++_RS,MutationTaster_pred,MutationTaster_score,MutationAssessor_pred,MutationAssessor_score,FATHMM_score,FATHMM_pred,1000Gp3_AC,1000Gp3_AF,1000Gp3_AFR_AC,1000Gp3_AFR_AF,1000Gp3_EUR_AC,1000Gp3_EUR_AF,1000Gp3_AMR_AC,1000Gp3_AMR_AF,1000Gp3_EAS_AC,1000Gp3_EAS_AF,1000Gp3_SAS_AC,1000Gp3_SAS_AF,UK10K_AF,ESP6500_AA_AF,ESP6500_EA_AF,gnomAD_exomes_POPMAX_AF,gnomAD_exomes_POPMAX_nhomalt,gnomAD_genomes_POPMAX_AF,gnomAD_genomes_POPMAX_nhomalt,GTEx_V8_gene,GTEx_V8_tissue,Geuvadis_eQTL_target_gene,Polyphen2_HDIV_score,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,ExAC_AC,ExAC_AF,ExAC_Adj_AF,ExAC_AFR_AC,ExAC_AFR_AF,ExAC_AMR_AC,ExAC_AMR_AF,ExAC_EAS_AC,ExAC_EAS_AF,ExAC_FIN_AC,ExAC_FIN_AF,ExAC_NFE_AC,ExAC_NFE_AF,ExAC_SAS_AC,ExAC_SAS_AF,REVEL_score,REVEL_rankscore,clinvar_id,clinvar_clnsig,clinvar_trait,clinvar_review,clinvar_hgvs,clinvar_var_source,clinvar_MedGen_id,clinvar_OMIM_id,clinvar_Orphanet_id,CADD_phred,ExAC_Adj_AC,gnomAD_exomes_AN,gnomAD_exomes_AC,gnomAD_genomes_AN,gnomAD_genomes_AC,gnomAD_exomes_controls_AC,gnomAD_exomes_controls_AN,gnomAD_exomes_AFR_AF,gnomAD_exomes_AMR_AF,gnomAD_exomes_ASJ_AF,gnomAD_exomes_EAS_AF,gnomAD_exomes_FIN_AF,gnomAD_exomes_NFE_AF,gnomAD_exomes_SAS_AF,gnomAD_exomes_controls_AF,gnomAD_genomes_AFR_AF,gnomAD_genomes_AMR_AF,gnomAD_genomes_ASJ_AF,gnomAD_genomes_EAS_AF,gnomAD_genomes_FIN_AF,gnomAD_genomes_NFE_AF,genename \
                -o /vep_out/$(basename $VCF .vcf).VEP-annotated.vcf

I have some variants with VEP default PolyPhen value benign(0.001) and dbNSFP .&.&.&. (in both score and pred). As example:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|MANE_SELECT|MANE_PLUS_CLINICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|UNIPROT_ISOFORM|SOURCE|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|ada_score|rf_score|CADD_PHRED|CADD_RAW|1000Gp3_AC|1000Gp3_AF|1000Gp3_AFR_AC|1000Gp3_AFR_AF|1000Gp3_AMR_AC|1000Gp3_AMR_AF|1000Gp3_EAS_AC|1000Gp3_EAS_AF|1000Gp3_EUR_AC|1000Gp3_EUR_AF|1000Gp3_SAS_AC|1000Gp3_SAS_AF|CADD_phred|ESP6500_AA_AF|ESP6500_EA_AF|ExAC_AC|ExAC_AF|ExAC_AFR_AC|ExAC_AFR_AF|ExAC_AMR_AC|ExAC_AMR_AF|ExAC_Adj_AC|ExAC_Adj_AF|ExAC_EAS_AC|ExAC_EAS_AF|ExAC_FIN_AC|ExAC_FIN_AF|ExAC_NFE_AC|ExAC_NFE_AF|ExAC_SAS_AC|ExAC_SAS_AF|FATHMM_pred|FATHMM_score|GERP++_RS|GTEx_V8_gene|GTEx_V8_tissue|Geuvadis_eQTL_target_gene|LRT_pred|LRT_score|MutationAssessor_pred|MutationAssessor_score|MutationTaster_pred|MutationTaster_score|Polyphen2_HDIV_pred|Polyphen2_HDIV_score|Polyphen2_HVAR_pred|Polyphen2_HVAR_score|REVEL_rankscore|REVEL_score|UK10K_AF|clinvar_MedGen_id|clinvar_OMIM_id|clinvar_Orphanet_id|clinvar_clnsig|clinvar_hgvs|clinvar_id|clinvar_review|clinvar_trait|clinvar_var_source|genename|gnomAD_exomes_AC|gnomAD_exomes_AFR_AF|gnomAD_exomes_AMR_AF|gnomAD_exomes_AN|gnomAD_exomes_ASJ_AF|gnomAD_exomes_EAS_AF|gnomAD_exomes_FIN_AF|gnomAD_exomes_NFE_AF|gnomAD_exomes_POPMAX_AF|gnomAD_exomes_POPMAX_nhomalt|gnomAD_exomes_SAS_AF|gnomAD_exomes_controls_AC|gnomAD_exomes_controls_AF|gnomAD_exomes_controls_AN|gnomAD_genomes_AC|gnomAD_genomes_AFR_AF|gnomAD_genomes_AMR_AF|gnomAD_genomes_AN|gnomAD_genomes_ASJ_AF|gnomAD_genomes_EAS_AF|gnomAD_genomes_FIN_AF|gnomAD_genomes_NFE_AF|gnomAD_genomes_POPMAX_AF|gnomAD_genomes_POPMAX_nhomalt|ClinVar|ClinVar_CLNSIG|ClinVar_CLNREVSTAT|ClinVar_CLNDN|ClinVar_CLNDISDB">
chr9    111706686   rs7036568   A   C   326.58  PASS    CSQ=C|missense_variant|MODERATE|SHOC1|ENSG00000165181|Transcript|ENST00000374287|protein_coding|17/25||ENST00000374287.7:c.2427T>G|ENSP00000363405.3:p.Asn809Lys|2562|2427|809|N/K|aaT/aaG|rs7036568&COSV59502213&COSV59505598||-1||SNV|HGNC|HGNC:26535|YES|||5|P4|CCDS6781.3|ENSP00000363405|Q5VXU9.110||UPI0000458916|Q5VXU9-1|||tolerated(0.1)|benign(0.001)|Pfam:PF17825&PANTHER:PTHR35668|||0.2175|0.3275|0.2205|0.0476|0.2475|0.2106|0.3355|0.2517|0.2146|0.3258|0.1606|0.1939|0.05081|0.2115|0.2357|0.212|0.2461|0.3355|AA||0&1&1|0&1&1|||||||||8.419|0.696145|1089|0.21745207667731628|433|0.3275340393343419|153|0.22046109510086456|48|0.047619047619047616|249|0.24751491053677932|206|0.21063394683026584|8.419|0.3354516568315933|0.25168643870667595|26764|2.205e-01|3388|3.276e-01|1817|1.572e-01|26750|2.208e-01|448|5.186e-02|1404|2.125e-01|15429|2.315e-01|4076|2.478e-01|T&T&T&T|3.69&3.7&3.7&3.7|-3.94|GNG10&GNG10&GNG10&SHOC1&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&LRRC37A5P&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&GNG10&SHOC1&GNG10&SHOC1&GNG10&GNG10&GNG10&GNG10&AL354877.1&GNG10|Adipose_Subcutaneous&Adipose_Visceral_Omentum&Adrenal_Gland&Adrenal_Gland&Artery_Aorta&Artery_Coronary&Artery_Tibial&Brain_Anterior_cingulate_cortex_BA24&Brain_Caudate_basal_ganglia&Brain_Cerebellar_Hemisphere&Brain_Cerebellum&Brain_Cortex&Brain_Frontal_Cortex_BA9&Brain_Hippocampus&Brain_Hypothalamus&Brain_Nucleus_accumbens_basal_ganglia&Brain_Putamen_basal_ganglia&Breast_Mammary_Tissue&Cells_Cultured_fibroblasts&Cells_Cultured_fibroblasts&Colon_Sigmoid&Colon_Transverse&Esophagus_Gastroesophageal_Junction&Esophagus_Mucosa&Esophagus_Muscularis&Heart_Atrial_Appendage&Heart_Left_Ventricle&Lung&Muscle_Skeletal&Nerve_Tibial&Pancreas&Pituitary&Prostate&Skin_Not_Sun_Exposed_Suprapubic&Skin_Not_Sun_Exposed_Suprapubic&Skin_Sun_Exposed_Lower_leg&Skin_Sun_Exposed_Lower_leg&Small_Intestine_Terminal_Ileum&Spleen&Stomach&Thyroid&Thyroid&Whole_Blood||N|0.981503|.&.&.&.|.&.&.&.|P&P&P&P|1&1&1&1|.&.&.&.|.&.&.&.|.&.&.&.|.&.&.&.|0.07958|0.032|0.24451203385347792||||||||||C9orf84&C9orf84&C9orf84&C9orf84|53220|3.257936e-01|1.606052e-01|247990|1.939163e-01|5.080654e-02|2.115260e-01|2.356695e-01|3.257936e-01|853|2.460549e-01|22955|2.125306e-01|108008|36774|3.253305e-01|2.074765e-01|142994|2.081829e-01|5.399361e-02|2.114390e-01|2.439758e-01|2.493430e-01|90||||| GT:AD:DP:GQ:PL:PP   0/0:25,0:25:75:0,72,799:0,75,810    0/0:34,0:34:99:0,102,985:0,105,996  0/0:28,0:28:84:0,81,932:0,84,943    0/0:29,0:29:84:0,81,1215:0,84,1226  0/0:26,0:26:75:0,72,1080:0,75,1091  0/0:28,0:28:81:0,78,1170:0,81,1181  0/0:22,0:22:66:0,63,696:0,66,707    0/1:23,14:37:99:339,0,573:336,0,582 0/0:30,0:30:93:0,90,818:0,93,829    0/0:36,0:36:99:0,99,1050:0,102,1061 0/0:32,0:32:93:0,90,938:0,93,949    0/0:20,0:20:63:0,60,626:0,63,637    0/0:28,0:28:84:0,81,820:0,84,831

or with VEP probably_damaging(0.939) and dbNSFP .&.&.&. again:

chr10   98423813    rs2296434   G   C   873.42  PASS    CSQ=C|missense_variant|MODERATE|HPS1|ENSG00000107521|Transcript|ENST00000325103|protein_coding|15/20||ENST00000325103.10:c.1472C>G|ENSP00000326649.6:p.Pro491Arg|1706|1472|491|P/R|cCc/cGc|rs2296434&COSV57267549||-1||SNV|HGNC|HGNC:5163|YES|||5|P2|CCDS7475.1|ENSP00000326649|Q92902.179||UPI000006D5B0|Q92902-1||1|deleterious(0.01)|probably_damaging(0.939)|PANTHER:PTHR12761|||0.1244|0.1369|0.1614|0.1716|0.1123|0.045|0.1099|0.08453|0.1068|0.1072|0.1637|0.1211|0.1863|0.1291|0.08881|0.1121|0.04018|0.1863|gnomAD_EAS|benign|0&1|1&1|25741868&24033266&20301464||||||||25.2|3.634449|623|0.12440095846645367|181|0.13691376701966718|112|0.16138328530259366|173|0.17162698412698413|113|0.11232604373757456|44|0.044989775051124746|25.2|0.10985020426690877|0.08453488372093024|12650|1.042e-01|1125|1.093e-01|2008|1.740e-01|12627|1.045e-01|1589|1.843e-01|878|1.340e-01|6258|9.422e-02|661|4.004e-02|T&T&.&T|1.54&1.54&.&1.54|5.51|HPS1&PYROXD2&HPS1|Adipose_Subcutaneous&Whole_Blood&Whole_Blood||D|0.000008|M&M&M&.|2.685&2.685&2.685&.|P&P|0.00605226&0.00605226|.&.&.&.|.&.&.&.|.&.&.&.|.&.&.&.|0.42485|0.163|0.08265009256810367|C2931875&CN169374|203300||Benign|NC_000010.11:g.98423813G>C|21094|criteria_provided&_multiple_submitters&_no_conflicts|Hermansky-Pudlak_syndrome_1&not_specified|Illumina_Clinical_Services_Laboratory&Illumina:6193&UniProtKB_(protein):Q92902#VAR_005291|HPS1&HPS1&HPS1&HPS1|26818|1.072487e-01|1.636948e-01|251010|1.211007e-01|1.863033e-01|1.290517e-01|8.880704e-02|1.863033e-01|322|4.017770e-02|11849|1.083843e-01|109324|14393|1.050928e-01|1.241400e-01|143242|1.248496e-01|1.774297e-01|1.327856e-01|8.494717e-02|4.470743e-02|3|21094|Benign|criteria_provided&_multiple_submitters&_no_conflicts|Hermansky-Pudlak_syndrome_1&not_specified|MONDO:MONDO:0008748&MedGen:C2931875&OMIM:203300&MedGen:CN169374    GT:AD:DP:GQ:PL:PP   0/0:28,0:28:86:0,81,929:0,86,946    0/0:34,0:34:99:0,99,961:0,104,978   0/0:27,0:27:86:0,81,848:0,86,865    0/1:19,21:40:99:445,0,422:440,0,433 0/0:25,0:25:77:0,72,808:0,77,825    0/0:41,0:41:95:0,90,1149:0,95,1166  0/0:26,0:26:77:0,72,1080:0,77,1097  0/1:11,20:31:99:447,0,238:442,0,249 0/0:34,0:34:99:0,99,1013:0,104,1030 0/0:25,0:25:77:0,72,1080:0,77,1097  0/0:38,0:38:99:0,103,1050:0,108,1067    0/0:35,0:35:99:0,102,1143:0,107,1160    0/0:34,0:34:95:0,90,1004:0,95,1021

Others are fine...can you help me with that?

at7 commented 3 years ago

Hello, there don't seem to be any PolyPhen scores for your examples in the dbNSFP file. I don't know how easy it is to retrieve any more information on why the scores are missing from the file from the dbNSFP team, but it would be worth a try. It could be related to the input data that is used for running PolyPhen. Here is a description of how we run PolyPhen in Ensembl. Please let me know if you have any more questions.

Best wishes, Anja

cccnrc commented 3 years ago

Hi Anja, thank you very much for this, is there any way to print out both HDIV and HVAR directly from VEP?

Enrico


From: Anja Thormann @.***> Sent: Friday, May 21, 2021 6:10 PM To: Ensembl/VEP_plugins Cc: Cocchi, Enrico; Author Subject: [EXTERNAL] Re: [Ensembl/VEP_plugins] PolyPhen HVAR: VEP v103 and dbNSFP v4.1 discrepancies (#410)

Hello, there don't seem to be any PolyPhen scores for your examples in the dbNSFP file. I don't know how easy it is to retrieve any more information on why the scores are missing from the file from the dbNSFP team, but it would be worth a try. It could be related to the input data that is used for running PolyPhen. Herehttps://urldefense.proofpoint.com/v2/url?u=https-3A__www.ensembl.org_info_genome_variation_prediction_protein-5Ffunction.html-23polyphen&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=L_EXHCqw03pWFC85bVnWIN5GeCYe1vdJLljyHQSbSQI&m=6tztVR2GAoxs2Sd6n_Qh-8I5ZjgP5f7NnxDAkVOjlXw&s=aelYRuDMLda5KqGYz6O7vpy7nEG3HCPBQNT65xBT-Q0&e= is a description of how we run PolyPhen in Ensembl. Please let me know if you have any more questions.

Best wishes, Anja

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_VEP-5Fplugins_issues_410-23issuecomment-2D846070033&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=L_EXHCqw03pWFC85bVnWIN5GeCYe1vdJLljyHQSbSQI&m=6tztVR2GAoxs2Sd6n_Qh-8I5ZjgP5f7NnxDAkVOjlXw&s=YI9E3wbAwpEqHHDr9sZyB5lMLchTa3dAijjyhevRF3w&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALSODBNVH3GYALJ5YBELROLTO2AVZANCNFSM45JMT33A&d=DwMCaQ&c=G2MiLlal7SXE3PeSnG8W6_JBU6FcdVjSsBSbw6gcR0U&r=L_EXHCqw03pWFC85bVnWIN5GeCYe1vdJLljyHQSbSQI&m=6tztVR2GAoxs2Sd6n_Qh-8I5ZjgP5f7NnxDAkVOjlXw&s=Lx0ypUIr1wHI38iGPavqXSBZGUTX4J-DipDhlfKAE98&e=.

at7 commented 3 years ago

Hi Enrico, VEP returns HumVar results as default. If you want HumDiv scores instead you can specify it with --humdiv.

Best wishes, Anja

dglemos commented 2 years ago

I'm going to close this issue but if you have more questions feel free to open a new one.

Best wishes, Diana