TheJacksonLaboratory / LIRICAL

LIkelihood Ratio Interpretation of Clinical AbnormaLities
https://thejacksonlaboratory.github.io/LIRICAL/stable
Other
24 stars 11 forks source link

Consider heuristic for genes with lots of benign variants #162

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago

Currently, we do not downweight genes that have say 10 benign and one called pathogenic variant. However, experience shows that often these cases result in false positive results. We can consider a heuristic that would downweight the score for the numerator by

(0.95)^n

where n is the number of benign variants. It would be nicer to frame this as a probability distribution.

pnrobinson commented 5 years ago

This is apparent in the example for HEREDITARY MYOPATHY WITH EARLY RESPIRATORY FAILURE, which has ten variants (TTN) including one with a pathogenicity score of 0.96 0.97. λdisease=1. λbackground=9.4235. P(G|D)=0.3731. P(G|¬D)=0.0007. log10(LR): 2.72.

pnrobinson commented 5 years ago

The problem is that although it makes sense to say that the P(G|D)=0.3731, because we found one variant, the P(G|¬D)=0.0007 is not a good way to calculate this probabilty with the background lambda of 9.4. This is prob related to the way we are calculating the background frequency (which is an overestimate)

pnrobinson commented 5 years ago

Let's try this heuristic: The background score ranged from 0 to 20.8 (for {\it MUC16}). Numerous disease-associated genes displayed scores over 1.0, including for example {\it TTN} with a score of 9.4. According to our model, it is not surprising to observe a predicted pathogenic variant in a gene such as {\it TTN}, whether or not the gene is associated with the disease being investigated in any particular case. We can best achieve this by setting $\lambda^{\mathcal{B}}_g=\min(\lambda^{\mathcal{D}}_g,\lambda^{\mathcal{B}}_g)$ in cases where $\lambda^{\mathcal{B}}_g\geq 1$. For instance, if one predicted pathogenic variant is identified in {\it TTN}, this scheme would lead to a likelihood ratio of one -- the observation of the predicted pathogenic variant in this gene neither adds to nor detracts from the probability of the differential diagnosis (note that we treat known disease-associated variants differently, see below).

pnrobinson commented 5 years ago

I have implemented this and it takes care of the artefacts with TTN apparently. I think we can close this issue, but we need to assess this heuristic once we have all the phenopackets.