NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Treats Or Applied Or Studied To Treat - Should not be used as the Inferred Edge #865

Closed sstemann closed 1 week ago

sstemann commented 1 month ago

This is my understanding from the 7/12 TAQA review of the murals. It looks like primarily an issue with ARAX.

image

Environment: Test PK: 72895ed9-595c-4294-9e3d-53bd6ee2a8b5

saramsey commented 1 month ago

Paging @dkoslicki and @chunyuma; this looks like a xDTD result

saramsey commented 1 month ago

@sstemann what edge type should be used instead? Please pardon my ignorance.

dkoslicki commented 1 month ago

I'm confused @sstemann , I was under the impression after the treats refactor, that this was the correct (mixin) predicate

saramsey commented 1 month ago

Per @sierra-moxon, mixin predicates are now allowed in TRAPI responses. I have repeatedly confirmed this with the SRI team.

saramsey commented 1 month ago

If I am mistaken about the mixin matter, please DM me on Slack.

sierra-moxon commented 1 month ago

@saramsey @dkoslicki: It is totally fine to use mixin predicates in KGs. In fact, "treats" is also a mixin.

However, in most of the results from other ARAs, the "inferred" edge (knowledge_level: predicted) uses the "treats" predicate instead of the higher-level "treats_or_applied_or_studied_to_treat" predicate. Along with this "treats" inference edge, we typically see support_graphs (support paths in the UI) that show the edges that go into making that "treats" prediction. For example, we often see a "treats" edge inference from an ARA with support_paths/edges for that inference from TMKP. TMKP uses the "treats_or_applied_or_studied_to_treat" predicate directly, the ARA returns an inferred "treats" edge based on the "treats_or_applied_or_studied_to_treat" edge from TMKP.

Is the idea with this result that the attribute probability_treats on this edge conveys some level of confidence that the "treats_or_applied_or_studied_to_treat" edge can be interpreted as a "treats" edge instead of instantiating the "treats" edge directly from ARAX?

dkoslicki commented 1 month ago

@sierra-moxon we went with the treats_or_applied_or_studied_to_treat predicates as our (inferred, not lookup) results are generated by a reinforcement learning approach, so we needed to pick the most general predicate we could as we can't a-priori guarantee that that it's not a "applied to treat" or "studied to treat" and just a "treats".

The probability_treats attribute indeed exactly conveys the confidence (the ML model has) that it can be interpreted as a treats edge

cbizon commented 1 month ago

@dkoslicki Just in terms of how this is structured, we're asking a question ?-x->B, and we're looking for answers that are more or less likely to be true. That is we want answers of the form A-x->B, along with any support for that statement. If you have support for a different predicate y then that's only interesting A-y->B also supports A-x->B. Especially if y is a superpredicate of x.

So if you think that the treats_or_applied predicate is the best representation then what would fit best (IMO) would be returning

A-treats->B (supported by) A-treats_or_whatever->B (supported by) more paths

I don't particularly think that middle layer buys you much, but I may not understand the subtleties of your approach.

dkoslicki commented 1 month ago

Fixed in this PR: https://github.com/RTXteam/RTX/pull/2330

@cbizon it would be great to get some guidance on how the treats refactor impacts MVP1. Our understanding might have been incorrect in replacing all treats edges with the more generic mixin in both KG2 and ARAX.

sierra-moxon commented 1 month ago

I'm going to schedule a 30 min meetup on this with y'all so that we can address the issues around "treats."

mbrush commented 1 month ago

Hi all. I jotted down my thoughts on these issues, as well as some more context around the 'treats' refactor. Looking forward to the call on Monday.

1. Re: how the refactor will affect edges in KG2

Prior to the refactor, KPs mapped knowledge from many sources to treats where the source was actually reporting something more foundational - e.g. that a drug is in a phase 2 trial for a disease, or was self-reported to be taken for a disease by 20 patients. The treats predicate was used incorrectly/imprecisely in many cases because its original definition was ambiguous/under-specified, and no other predicates were available. But these relationships do not meet our current criteria for what qualifies as a 'treats' assertion as defined in Biolink.

The treats refactor provided more precise predicates that allow us to express what these sources are actually reporting (e.g. in clinical trials for, or applied to treat). The slide deck here provides more info about the refactor and how to implement it.

To conform to the 'treats' and KL/AT refactors, KPs like KG2 needed to review their treats edges and decide which can continue to use the treats predicate as assertions (because they are consistent with the Biolink definition and requirements for asserting this relationship), and which should be 'downgraded' to use one of the new more foundational predicates. We provided the transform/mapping guide here to help with this (note that there are still a few sources that need to be explored and mapped, e.g. MONDO, NCIT, repoDB ).

2. Re: regenerating the lost 'treats' edges

It is indeed the case that the majority of treats edges agree 'lost' in KPs after the treats refactor. The last piece of the puzzle is a way to 'regenerate' the lost treats edges, but as predictions that can be made based on the more foundational edges they were replaced by.

Here is where the CQS comes into play: it decides when a treats prediction may be warranted based on these more foundational facts, and creates these edges in response to creative mode queries. The CQS creates the treats predicted edge, and a support path that consists of the foundational edge it was based on - so that this provenance can be presented in the UI using the existing support path paradigm. There are currently three predominant manifestations of this in our data:

  1. X treats Y (prediction) supported by X in_clinical_trials_for Y (assertion)
  2. X treats Y (prediction) supported by X applied to treat Y (assertion)
  3. X treats Y (prediction) supported by X treats_or_studied_or_applied_to_treat Y

This last one in particular enables text mined edges that previously mapped to treats to now use the weaker predicate treats_or_studied_or_applied_to_treat that more accurately reflects the level of imprecision/uncertainty inherent in text-mined edges (we don’t know if these mined edges report a true treats relationship, or merely the fact that a researcher was studying a possible treatment, or a patient/physician tried applying a treatment for their condition).

Note that the CQS is only responsible for generating treats edges as predictions to make up for the fact that many direct treats edges in KPs like KG2 were lost in the refactor (i.e. replaced with more foundational predicates). ARAs like ARAX continue to generate their predictions as before - but need to tune their templates / train their prediction models on KGs that now include these more foundational predicates.

3. Re: the rationale that "we needed to pick the most general predicate we could as we can't a-priori guarantee that that it's not a applied to treat or studied to treat and just a treats " (DK).

IMO, while ARAX does use a unique methodology to make its MVP1 predictions, at the end of the day the edges it creates can be understood as 'predictions' that a treats relationship may exist between the subject chemical and object condition. The whole idea of creative mode predictions is that we can create treats edges and signal our lower certainty by tagging them with KL = 'prediction'.

The reasoner cannot be sure whether the relationship is treats, or applied to treat, or studied to treat - but the fact that one of these is likely to exist is reason enough to make a 'treat' edge as a prediction. This is the same logic followed in making treats predictions based on text-mined edges.

As Chris B said - ARAX could be super explicit about this in its creative treats edges, and have two levels of nested support paths:

A-treats->B                                 # arax secondary prediction - based on the xDTD prediction below
(supported by)
A-treats_or_applied_or_studied_to_treat->B  # arax xDTD prediction 
(supported by)
many support paths                          # explanatory paths generated post xDTD by the actor-critic network

But I agree this middle layer doesn’t buy us much, and is not necessary I think it is perfectly fine to ahve the ARAX xDTD directly predict treats and follow the pattern used by all other ARAs:

A-treats->B                     # arax prediction from xDTD model directly predicts 'treats'
(supported by)
many support paths              # explanatory paths generated post xDTD by the actor-critic network

Which is exactly as ARAX predictions looked before the refactor. The difference is that many of the edges in the support paths will be more informative because they more precisely express what their sources said in the first place.

4. Finally, re: any concern that the refactor may impact ARAX ML learning tool, as the treats edges in Translator / RTX-KG2 is what they use to train their model.

The fact that many of these treats edges now use more precise/accurate predicates should not prevent training of the ML tool on the refactored graph, and the additional precision / distinctions they afford may even provide opportunities to more finely tune the model.

sstemann commented 1 month ago

the original ticket is resolved in Fugu/Test. @mbrush if there is something in your notes that needs to be addressed, i suggest making a specific ticket.

https://ui.test.transltr.io/main/results?l=Skin%20Vascular%20Disease&i=MONDO:0019293&t=0&r=0&q=86fc5345-6102-4dbc-abfa-42a605c72bc7

sstemann commented 1 month ago

this issue also applies to BTE

https://ui.test.transltr.io/main/results?l=Skin%20Vascular%20Disease&i=MONDO:0019293&t=0&r=0&q=645a836f-dee9-4e01-b95e-5eb7db2ab7c8

image

image

The inferred edge should be treats

colleenXu commented 4 weeks ago

There is now a fix deployed to BTE CI.

sstemann commented 1 week ago

looks resolved in prod