Closed saramsey closed 1 week ago
I am tagging @dkoslicki and @edeutsch to see if this can be brought up in the Architecture Committee meeting. I think the best remedy here would be if MolPro stops doing this kind of "superclass reasoning", at least by default. The second-best remedy would be if MolPro makes "superclass reasoning" something that can be turned off by specifying an option somewhere in the TRAPI query graph.
Another example of MolPro results where superclass reasoning is evident (provided in the Agenda for the Aug. 3 AHM) can be see in this ARS query result link, where one should go to the ARAX results. We see all kinds of drugs in the result-list that are connected to high-level disease concepts like "rheumatic disorder", "psychiatric disorder", "disease or disorder", "neurodegenerative disease", "immune systems disease", basically the whole gamut.
Marking this "high priority" now that it has come up in multiple Translator stand-up meetings or Question-of-the-Month sessions.
It may be that MolPro has already inferred and stored all of these "superclass-reasoned" triples internally, in which case, it may be more difficult for them to filter them out. We should maybe try to ascertain whether or not this is the case, as it may be relevant to what kind of remedy we can hope to get.
in the short-term should we remove MolePro as a KP? that is a trivial change, easy to undo when a solution is settled on.
I'm open to the idea. Would like to get David and Eric's take on it as well.
Building on your idea, perhaps we could temporarily disable MolPro in the production system but keep a "with MolPro" version handy (e.g., in /beta
or /test
or whatever) should we need it, for example, to support ongoing discussions with the MolPro team about the effect of superclass reasoning on ARAX results.
It is a bit of a drastic move, but I would support it.
I wonder if a good way to handle this is to compute a relevance score for each received edge? Although a fair bit of work, maybe we could be able to consider each edge we receive and see if it is relevant. If we ask for information about type 1 diabetes and get back answers for immune disorder, perhaps we could keep it, but downweight it substantially as not very relevant. This would be for pinned nodes in queries. If we had easy access to ontologies, we could compute how relevant is the returned node relative to the pinned nodes. Exact matches are highly relevant. Children are less relevant but okay. Ancestors are severely downweighted, perhaps say a factor of 2 in each generation. This could allow such ancestor reasoning to stay, but be downweighted as not very relevant to the question. If there are relevant answers, they get prioritized and these ancestor relationships are way down the rank. If there's nothing relevant, then less relevant things are top.
Perhaps a good abstract question is: if I ask ARAX about specific disease X, and there's nothing highly relevant for X or children of X, might I be interested in generic things for ancestors of X like immune disorders, or do I want nothing in response. Nothing or less relevant?
I'd actually prefer to have something less relevant than nothing.
What's the status on this? Did we end up removing MolPro from Expand?
no, no action was taken - we still currently use MolePro
See https://github.com/NCATSTranslator/Feedback/issues/148 for a possible additional issue caused by superclass reasoning
So the MolePro team says they have fixed the issue with their disease hierarchy reasoning. We might want to double-check if this issue has "gone away" in the latest version of MolePro queried via ARAX.
Take a look at: https://arax.ncats.io/?r=135342, specifically things like result 14, where superclass reasoning is used. Maybe their fix hasn't been deployed yet?
The problem still occurs with https://arax.ncats.io/?r=135342, and answer like 4-aminophenylarsenoxide are not related to the query. You can disregard my earlier note, the answer is not more helpful than having nothing at all. It's just that edges were displaying on top of each other. Here they are spread out.
At the same time, I expect people will query with "Psoriatic Arthritis", not "Susceptibility to Psoriatic Arthritis".
Thank you @jh111 and @dkoslicki for pointing out this issue is still going on. I have reached out to the MolePro team via Slack DM and via a comment on NCATSTranslator/Feedback issue 148, to find out which ITRB service maturity level their fix was deployed to (test, dev, or prod).
Update: turns out this was a subtle issue between ARAGORN/Automat and MolePro. Apparently an Automat fix will be up within a week, and then MolePro will re-build and push a fix.
Thank you @dkoslicki!
Seems to be fixed now the test query
The following query,
(which came out of the Translator Question of the Month session today), is generating some results (see ARAX results 55221 that show loss of semantic precision that I think derives from the "superclass reasoning" that MolPro is doing (which was discussed at length in the Expander Agent all-hands meeting on Aug. 3, 2022; see also RTX issue 1855 which I think has the same root cause as this issue). The aforementioned query is asking for concepts (any concepts) that are
biolink:related_to
the disease "suseptibility to psoriatic arthritis" (coded as a query node with a pair of CURIEs,MONDO:0100231
andMONDO:0100232
). Seems straightforwad, but we are seeing a bunch of organic compounds returned that are not related to "susceptibility to psoriatic arthritis" but instead are related toMONDO:0042489
("disease susceptibility"):Note,
MONDO:0042489
is two levels higher in the MONDO hierarchy thanMONDO:0100232
, as shown here: