TheJacksonLaboratory / LIRICAL

LIkelihood Ratio Interpretation of Clinical AbnormaLities
https://thejacksonlaboratory.github.io/LIRICAL/stable
Other
22 stars 11 forks source link

Phenotype matching and diseases missing phenotypes #534

Closed oleraj closed 4 years ago

oleraj commented 4 years ago

Hi Peter,

We did some more testing with LIRICAL and noticed some issues with the phenotype matching. One example is a CTLA4 patient that had this phenotype:

['HP:0000138','HP:0000964','HP:0001369','HP:0002019','HP:0002020','HP:0002028','HP:0002076','HP:0002721','HP:0002829','HP:0002900','HP:0003270','HP:0012317','HP:0012395','HP:0031292','HP:0100594','HP:0100785']

LIRICAL ranks CTLA4 near the top but the likelihood ratio is not very good. image image

The two matching phenotypes are 'Chronic diarrhea' and 'Eczema'. Some of the other phenotypes are maybe unrelated to the disorder and we noticed that they are counted against the diagnosis rather than being treated as neutral. I think that assumes the phenotypic description of disorders is perfectly complete, which is probably not the case. And it makes it tricky to run LIRICAL if any phenotype listed in our database for a patient not associated with the disorder counts against a diagnosis because it's hard to manually go through and see which ones would be directly related to the disorder. For cases with a dual diagnosis I can see potential issues as well, since a cluster of phenotypes might be related to one diagnosis and a separate cluster of phenotypes might be related to a separate diagnosis. But combining them together the unrelated phenotypes will be counted against both diagnoses so the LR would be low in both cases. I'm not sure if there's an easy way to deal with this. (As a side note, we tested two cases that had a dual diagnosis, both of which were ranked in the top 10 genes by Exomiser but the second diagnosis wasn't found in the top 10 for LIRICAL in either case.)

There are a few phenotypes that are related to the disorder that are also counted against the diagnosis, i.e., 'Arthritis' (HP:0001369), 'Immunodeficiency' (HP:0002721) and 'Arthralgia' (HP:0002829). "Autoimmune arthritis" is specifically listed in the OMIM description so it's surprising that 'Arthritis' is counted against the diagnosis. "Immunodeficiency" and "Arthralgia" aren't specifically listed in the OMIM description but immunodeficiency is implied since CTLA4 deficiency is an immunodeficiency and arthralgia is similar to arthritis. Any thoughts on this? Does LIRICAL require exact matching of phenotypes or can similar phenotypes also contribute to the score like for Exomiser?

We wondered where the annotations were from that LIRICAL is using so went to the list of phenotypes on hpo.jax website for this disorder and found that no form of Arthritis was included, which could be why it was counted against instead of in favor of the diagnosis, assuming this is what LIRICAL is using. We also noticed a lot of other annotations from OMIM that were missing from the description on hpo.jax, e.g., "Decreased levels of naive T cells" (probably should have a more specific phenotype for this, but seems like it should at least match to HP:0011839, "Abnormal T cell count"), "Decreased memory B cells" (again, should probably have a more specific phenotype in HPO for this but it seems like the closest existing might be HP:0010976, "B lymphocytopenia") , "Granulomatous lymphocytic interstitial lung disease" and "Lymphocytic infiltration of the brain" (not sure what these last two would match to in HPO). We wondered if these disease curations from OMIM were done before some of these HPO terms were added to the ontology, which is why they were missed in the curation? Or maybe they were just missed because not enough synonyms were present to pick them up?

FWIW, Exomiser was able to identify CTLA4 as the top-ranked gene for this patient, although it actually matched to a different disorder than the two that were tested in LIRICAL. Here are the disorder and matched phenotypes found in Exomiser:

Granulomatosis with polyangiitis (ORPHA:900): Ovarian cyst (HP:0000138)-Prostatitis (HP:0000024),   Eczema (HP:0000964)-Skin rash (HP:0000988), Arthritis (HP:0001369)-Sinusitis   (HP:0000246), Constipation (HP:0002019)-Gastrointestinal hemorrhage   (HP:0002239), Gastroesophageal reflux (HP:0002020)-Intestinal obstruction   (HP:0005214), Chronic diarrhea (HP:0002028)-Nausea and vomiting (HP:0002017),   Migraine (HP:0002076)-Headache (HP:0002315), Immunodeficiency   (HP:0002721)-Autoimmunity (HP:0002960), Arthralgia (HP:0002829)-Arthralgia   (HP:0002829), Hypokalemia (HP:0002900)-Elevated C-reactive protein level   (HP:0011227), Abdominal distention (HP:0003270)-Nausea and vomiting   (HP:0002017), Sacroiliac arthritis (HP:0012317)-Sinusitis (HP:0000246),   Seasonal allergy (HP:0012395)-Autoimmunity (HP:0002960), Cutaneous abscess   (HP:0031292)-Skin rash (HP:0000988), Esophageal web (HP:0100594)-Intestinal   obstruction (HP:0005214),

Maybe LIRICAL doesn't use diseases from Orphanet?

Thanks,

Andrew

pnrobinson commented 4 years ago

@oleraj We are talking with Sandhya about a POET web portal to enable community annotation -- this is an example of what we need to fix. At first glance, the annotations for this disease are incomplete in our system. We are also expecting a lot of new annotations from European collaborators. At the moment, I think this is basically a limitation of the method -- it works well if the annotations are good but less so if the annotations are not. I will do an experiment of updating the annotations for this disease and if you like you could run this case through LIRICAL again. Obviously this does not fix all annotations, but it would show what could be gained. In general, Exomiser should be less sensitive to missing annotations because of its statistical model.

oleraj commented 4 years ago

Sure, that would be great. Happy to re-run this case after you update the annotations.

pnrobinson commented 4 years ago

@oleraj Many of the phenotypic abnormalities have not previously been described with this disease. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4668597/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4371526/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6215742/

e.g., esophageal web. They could either be unrelated complications or an expansion of the phenotypic spectrum of this disease. For instance, one might search for alternative etiologies for esophageal web, ovarian cyst, migraine, hypokalemia, or check data entry (was hypokalemia a one time event or is it a chronic condition -- if chronic, it has not been described so far with this disease).

I was able to add many annotations. Some of the above mentioned HPO terms were added (they were missing before). If you use the LIRICAL download function, you will get the new annotation file.

oleraj commented 4 years ago

I downloaded the new annotation and re-ran this case. CTLA4 is now the top-ranked gene and the LR is better, though still negative, possibly due to these additional phenotypes.
image And "Arthritis" and "Immunodeficiency" are now counted towards the diagnosis. image

At this point I'm not certain whether these are unrelated complications or an expansion of the phenotypic spectrum. I see the predicament this presents though. We'll just have to keep that in mind as we use LIRICAL and hopefully there will be a reliable way to keep these disease annotations updated in the future (POET, etc.)

pnrobinson commented 4 years ago

Thanks for the feedback -- this is a limitation of LIRICAL and probably of any algorithm that relies on phenotype matching. I think LIRICAL is functioning as expected with this data.