exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
196 stars 54 forks source link

Changing defaults for LOF variants? #209

Closed damiansm closed 5 years ago

damiansm commented 7 years ago

Current behaviour is to give LOF variants such as stop gain a default score of 0.95. I remember when the original Exomiser (mouse-only) paper was reviewed the reviewers found it crazy we were not scoring them as 1. My argument was always that (i) increasing it to 1 reduced performance on the simulated exomes, (ii) there are plenty of LOF variants in "normal" exomes, and (iii) stop-gain near end of protein could of course have no effect.

However, the whole rationale of the GEL pipeline is to classify LOF variants in panel genes as tier 1 with missense etc being tier 2 and outside panel tier 3. This fits with what NHS diagnostic labs want.

When running Exomiser/Genomiser I quite often see we have got the right gene as the top hit but assigned a missense variant with a PolyPhen score of 1 as the contributing variant rather than the tier 1, LOF variant that is considered diagnostic by the labs.

Not sure what the solution is but maybe worth reinvestigating the variant default scores for non-missense as a lot has changed since the original Exomiser paper e.g. we are using hiPhive, the frequency database have expanded a lot etc

damiansm commented 6 years ago

Also see Peter's comments in Dec: "I am working on code for the genotype likelihood ratio, and would like to generate a score that will reflect the probability of Exomiser calling a pathogenic variant with a given score if the gene is NOT the disease gene in the patient. I am looking through the heuristic scores we generated a few years back -- I think we made the decision that frameshift needed to be 85% based on permutations. I would like to revisit this, since I think it would be better to set these scores to 100% for the likelihood ratio test. We will do that in a local class for the moment, but do you remember where tohe 85% comes from? I wonder if it might be good to revisit these scores even for the 100,000 project currently, because since so much else in our software has gotten better, maybe they are no longer optimal."

damiansm commented 6 years ago

On 250+ GeL diagnosed cases changing STOP_GAIN and FRAMESHIFT variants to 1.0f increasing our recall from 0.7598 to 0.81102 when just considering the variants flagged as contributing.

In addition the Gel tiering pipeline considers these as well as START_LOSS_SCORE, STOP_LOSS_SCORE and SPLICING_SCORE as maximally pathogenic e.g. tier 1.

pnrobinson commented 6 years ago

Interesting! This suggests we need to recalibrate/rethink these constants?

damiansm commented 6 years ago

@pnrobinson I think this fits with what you were suggesting back in Dec - see above comment. On the NHS side the clinical geneticists want to see the nonsense, frameshift and splicing variants above all other types after standard filtering. We are thinking of changing this in release 10.0.0 which we hope to push out today or tomorrow. Do you think it is too radical. The evidence on real GeL cases certainly seems compelling. I think we were too careful in the past as we did not have enough pop freq data to filter out all the benign LOF variants but now is not an issue

pnrobinson commented 6 years ago

I would agree with this. We should also figure out a statistical approach for deriving the best scores moving forward. The LR algorithm may help to do this, but the estimated time of arrival is a few months...

damiansm commented 6 years ago

That is good. From what I have seen with the GeL pipeline if one of these LOF variants remains after all the filters and fits the MOI for the disease and gene then it nearly always gets diagnosed. There could be an element of biasing in all this as these are the ones flagged as tier 1 by the pipeline and highlighted on the "homepage" of results so the GMCs could be ignoring the other filtered variants

damiansm commented 5 years ago

This was done in release 11.0.0