Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
152 stars 46 forks source link

variant scoring discrepancy #4959

Closed ielvers closed 1 month ago

ielvers commented 1 month ago

Hello dear Scout team!

I'm classifying a variant with the updated classification tool. But the added classification in the bottom left corner, including a score, does not always match the classification in the middle (which is the value that gets submitted when pressing Submit). Some examples:

image image image image image image image image image image

So sometimes (example scores -4, 8, 9, 10 above) the interpretation in the bottom left corner does not match the overall classification in the bottom middle. Also extra confusing that scores 6 and 7 gives Likely pathogenic in the middle while scores 8 and 9 lowers the classification to VUS.

For the Likely benign examples, I don't think score -1 is meant to be Likely benign? That should still be a VUS, right? I'm mostly including those to show that there is a discrepancy among Likely benign / Benign as well.

dnil commented 1 month ago

Dear @ielvers! Thank you for the feedback!

In general, it is known that the two scales do not have a perfect overlap. This happens only very occasionally in the VUS to LP/P span unless conflicting benign terms are given. Have a discussion with some of the local gurus, but my reading of Tatvtigian is that they very much think that the go-directly-to-VUS effect of any mildly benign criteria added to a clear pathogenic variant is a bit, shall we say, un-Bayesian.

It would help a lot if you would provide the actual terms you provided for each verdict - it is absolutely not unlikely that we missed something, but in general, the models will be giving a bit different results if you go in with a variant with terms that are conflicted between benign and pathogenic, whereas they should be very consistent on each side.

dnil commented 1 month ago

For the Likely benign examples, I don't think score -1 is meant to be Likely benign? That should still be a VUS, right? I'm mostly including those to show that there is a discrepancy among Likely benign / Benign as well.

I'm not quite sure what you mean with this? See eg https://pmc.ncbi.nlm.nih.gov/articles/PMC8011844/ table 3. Or consider that the scale from a VUS perspective is centering around the tepid range somewhere, and 0 is the point on the thermometer scale where a VUS freezes forever into a likely benign. 🥶

EDIT: ah wait, actually, just looking at the criteria, an LB boundary of -2 would give you better fit to the Richards rules, but not to the risk of misclassification! 😊

ielvers commented 1 month ago

Hi Daniel!

I appreciate your response but I'm not quite sure I follow. I thought the point of the bottom left box was to give additional information to the summary classification in the bottom middle, not to use something completely independent? I realize I might have overreacted :)

PVS1 + PP4 (OR PP1) gives "Score 9 Likely pathogenic" vs "Uncertain Significance" PS1 + PS3 gives "Score 8 Likely pathogenic" vs "Pathogenic" PS1 + PS3 + BS2 gives "Score 4 Warm VUS" vs "Pathogenic" Are the two classification systems supposed to disagree that much?

dnil commented 1 month ago

Right, a little bit of both; it does show the idiosyncratic nature of a couple of Richards et al rules, notes a few missing potential rules, highlights the lack of full guidance in Richards on how to weigh conflicting benign and pathogenic criteria - and gives a cool colourable output scale in the VUS regime.

I have a sense you are step by step rediscovering the first Tavtigian paper conclusions, and a little more?https://pmc.ncbi.nlm.nih.gov/articles/PMC6336098/

  1. This is as predicted. On a personal note, seeing a frameshift in a gene with very high specificity, with no contradicting criteria, would make me itch to call it a LP. I think we would have in the Sanger one gene at the time days. The same if the frameshift was "only" cosegregating with disease in several family members. Note perhaps the LP(i) (1 PVS + 1 PM ~ 0.994) in https://pmc.ncbi.nlm.nih.gov/articles/PMC6336098/#T1 being stronger than expected by Richards. A 1 PVS + 1 PS would give an Odds_Path of 350, or a Post_P of 0.975, or about as much as the somewhat controversial P(ii) rule. And 1 PVS + 1 PP ~ 0.9 Post_P, right on target with the Richards LP rules.
  2. Your second observation (PS1+PS3) exactly corresponds to the bolded Path (ii) in https://pmc.ncbi.nlm.nih.gov/articles/PMC6336098/#T1. They find that the posterior likelihood of this is decidedly lower than the other Pathogenic criteria from Richards (0.975 compared to 0.994), and weaker than eg LP(i) (1 PVS + 1 PM = 0.994).
  3. Having established that the PS1+PS3 is too strong in Richards, this is now rather clear. But it also highlights what happens when you have a sum rule to add in the benign criteria, before they trigger a classification in Richards. A BS alone is to us not the source of any classification (VUS). But PS1+PS3-BS1 in Tavtigian gives a meager Odds_Path of 18.7 or Post_P of 0.675. Way below a LP; a warm VUS, very much risking dropping to tepid. I will however hold that you are not going to be able to give me all that many examples of known pathogenic variants, with solid functional evidence, that also show a frequency that clearly puts them above the (continental) population frequency threshold for a disease of that kind and inheritance pattern, so maybe we are wandering into somewhat academic territory here?
dnil commented 1 month ago

Ah sorry, you had BS2 in your third example! 😊 I guess those would be a bit more common - that'd be the reduced penetrance ones. But that is another step entirely. Which we could probably formulate in statistical terms, or maybe find the missing contributions in other factors, genetic or otherwise. But another day! 😅

ielvers commented 1 month ago

Thanks for responding, Daniel. I appreciate the score within the bottom left classification. I thought it was supposed to match the bottom middle classification, but I was wrong :)

dnil commented 1 month ago

Super! Use it with thought, not just picking the results that fits the feeling at the moment, and perhaps as an indicator for when to dig a little deeper into the Richards criteria. In general I do feel Tavtigian is a bit more thought through, but note that we formally usually say we use Richards when reporting. Bayesian or not, this is still far from a fully automatic process - quite some manual judgement involved.