Closed bcrone closed 6 years ago
For PolyPhen scores VEP by default outputs humvar (you may output humdiv instead with --humdiv
); this may explain the discrepancy.
For SIFT, the prediction can be affected by a number of factors including the version of SIFT used and the underlying protein database used to create alignments. See the Ensembl documentation on how we produce both SIFT and PolyPhen predictions.
I agree that the SIFT discrepancies are likely due to versioning difference, and this is OK moving forward.
For PolyPhen, I ran VEP with the --humdiv option invoked, and this reduced the number of discrepancies between VEP and dbNSFP PolyPhen scoring, but still have a significant number of mismatches between the two.
Here are a few examples:
CHROM POS REF ALT VEP_POLYPHEN VEP_PRED DBNSFP_POLY_HDIV DBNSFP_PRED
1 215960144 T C 0.001 B 0.996 D
5 89921010 C A 0.015 B 0.872 P
10 73447440 G A 0.416 B 0.65,0.487,0.454 P
As an additional follow-up: what does the "unknown" PolyPhen annotation signify? This is not a standard PolyPhen annotation, and not finding a definition in the VEP documentation.
http://genetics.bwh.harvard.edu/pph2/dokuwiki/_media/hg0720.pdf says UNKNOWN is a rare prediction class
I'm hitting an issue where PolyPhen and SIFT scores differ between VEP and dbNSFP. Here are some examples:
-PolyPhen mismatch: CHROM POS REF ALT VEP_POLYPHEN VEP_PRED DBNSFP_POLY_HDIV DBNSFP_PRED 1 6485211 C A 0.044 B 0.999 D 3 127336823 G A 0.177 B 0.982,0.596,0.596 D,P,P 10 73574953 G A 0.243 B 1.0 D
Any reasoning behind these discrepancies?