RobokopU24 / Feedback

Feedback on the ROBOKOP project
https://robokop.renci.org
0 stars 0 forks source link

Subclass diseases do not make sense #142

Closed karafecho closed 1 year ago

karafecho commented 1 year ago

The following query is returning results that include diseases other than cardiovascular disorders: polybrominated diphenyl ether - related_to - Gene - related_to - cardiovascular disorder. For example, the top answer is for complete gonadal dysgenesis.

image

karafecho commented 1 year ago

Also see #139.

cbizon commented 1 year ago

Complete Gonadal Dysgenesis is a subclass of cardiovascular disease in MONDO: image

cbizon commented 1 year ago

So I'm not sure that this is a problem?

karafecho commented 1 year ago

Interesting. I had looked up complete gonadal dysgenesis before posting this ticket and did not see an obvious relationship to cardiovascular disorder. I just checked Orphanet's hierarchy and it, too, lists complete gonadal dysgenesis as a rare cardiac disease: https://www.orpha.net/consor/cgi-bin/Disease_Classif.php?lng=EN&data_id=146&PatId=1044&search=Disease_Classif_Simple&new=1. So, I guess this result is accurate.

There are three publications supporting the decabromodiphenyl oxide - NR5A1 edge. There isn't any publication support for the NR5A1 - complete gonadal dysgenesis edge, but Pharos is making the assertion and OmniCorp is supporting it with 45 co-occurrences, so I suppose the edge is sufficiently supported.

I'd be curious to: (1) compare these results (automat.renci.org/#/robokopkg) with those from ExEmPLAR (robokopkg.renci.org) and (2) get a reaction from the expert who submitted the question, as she was interested in cardiotoxicity (which I was unable to include as a node - I tried a bunch of entities/CURIES), but I'm guessing this answer, while valid, wouldn't align with her expectations/interests. Then again, I'm not sure it's worth sharing these results, as one might argue that the query didn't accurately capture the question.

In some sense, this issue relates to the 'subclass' issues noted in #139 and elsewhere.

karafecho commented 1 year ago

ExEmPLAR results (for comparison):

image

image

karafecho commented 1 year ago

ROBOKOP results for decabromodiphenyl oxide only:

image

cbizon commented 1 year ago

Exemplar (IIRC) is doing a text based search for nodes. It is not reasoning over subclass of edges, nor is it using identifiers. So there will be differences based on those things.

Is the issue here that we just don't like this subclass result? To some degree we're at the mercy of what the data says. The only other way I can think to affect this is by downweighting edges that are supported by subclass of inferences.

karafecho commented 1 year ago

Yes, I completely expect differences between ROBOKOP and ExEmPLAR (they aren't even querying the same KG); I was just comparing the two result sets, as noted.

I think we need to provide users with control over subclass inferences, especially when high-level nodes such as "cardiovascular disorder" are selected. I will close this ticket and comment on #139.