Ensembl / VEP_plugins

Plugins for the Ensembl Variant Effect Predictor (VEP)
Apache License 2.0
141 stars 116 forks source link

PolyPhen and SIFT score discrepancies between VEP 88 and dbNSFP v3.0 #46

Closed bcrone closed 6 years ago

bcrone commented 7 years ago

I'm hitting an issue where PolyPhen and SIFT scores differ between VEP and dbNSFP. Here are some examples:

-PolyPhen mismatch: CHROM POS REF ALT VEP_POLYPHEN VEP_PRED DBNSFP_POLY_HDIV DBNSFP_PRED 1 6485211 C A 0.044 B 0.999 D 3 127336823 G A 0.177 B 0.982,0.596,0.596 D,P,P 10 73574953 G A 0.243 B 1.0 D

Any reasoning behind these discrepancies?

willmclaren commented 7 years ago

For PolyPhen scores VEP by default outputs humvar (you may output humdiv instead with --humdiv); this may explain the discrepancy.

For SIFT, the prediction can be affected by a number of factors including the version of SIFT used and the underlying protein database used to create alignments. See the Ensembl documentation on how we produce both SIFT and PolyPhen predictions.

bcrone commented 7 years ago

I agree that the SIFT discrepancies are likely due to versioning difference, and this is OK moving forward. For PolyPhen, I ran VEP with the --humdiv option invoked, and this reduced the number of discrepancies between VEP and dbNSFP PolyPhen scoring, but still have a significant number of mismatches between the two. Here are a few examples:
CHROM POS REF ALT VEP_POLYPHEN VEP_PRED DBNSFP_POLY_HDIV DBNSFP_PRED 1 215960144 T C 0.001 B 0.996 D 5 89921010 C A 0.015 B 0.872 P 10 73447440 G A 0.416 B 0.65,0.487,0.454 P

bcrone commented 7 years ago

As an additional follow-up: what does the "unknown" PolyPhen annotation signify? This is not a standard PolyPhen annotation, and not finding a definition in the VEP documentation.

willmclaren commented 7 years ago

http://genetics.bwh.harvard.edu/pph2/dokuwiki/_media/hg0720.pdf says UNKNOWN is a rare prediction class