Heuristic for genes with "too many" called pathogenic variants

pnrobinson commented 5 years ago

Mode of inheritance: autosomal dominant. Observed weighted pathogenic variant count: 1.80. λdisease=1. λbackground=0.0761. P(G|D)=0.2193. P(G|¬D)=0.0054. log10(LR): 1.61.

In this case there were two called pathogenic variants but the score gets downweighted too much.

pnrobinson commented 5 years ago

Let us implement this heuristic instead, it seems to make more sense

One can also observe cases in which more than one pathogenic variant is called in a gene without a high background, i.e., $\lambda^{\mathcal{B}}g<1$. For instance, we might observe two called pathogenic variants ($c{path}=2$) in a gene with $\lambda^{\mathcal{B}}_g=0.076$. In this case, if we enter these values into equation (\ref{{eq:autosomal_dominant_lr}), the score in the numberator would be less than if only one called pathogenic variant had been observed. Because the overall frequency of called pathogenic variants in the gene in the population is low, we will regard such events as technical noise and still want to give the gene a high ranking, leaving it to the user to determine if one of the variants is genuinely related to the disease. Therefore, we set the number of observed variants to 1 in this case, and assign a score of the maximum of any of the observed pathogenicity scores.

pnrobinson commented 5 years ago

fixed

TheJacksonLaboratory / LIRICAL

Heuristic for genes with "too many" called pathogenic variants #265