Re-define "inferred" vs. "lookup" vs. "direct", display results into three buckets (defined by KL/AT) "text mined", "predicted", "curated"

sstemann commented 3 months ago

These are coming via ARAX as one-hop "lookups" - in both Test and Prod

https://ui.test.transltr.io/main/results?l=Mvp1&i=MONDO:0024529&t=0&r=0&q=b49bd0d6-c3c3-4bd4-b602-f815f0bfb8f4

When merged with BTE, these end up looking like OpenPredict Inferred these responses, which seems a little funny

dnsmith124 commented 3 months ago

I believe I can explain the behavior here:

These two edges from openpredict have the knowledge level of "predicted", but the UI primarily relies on the presence of support graphs to determine whether an edge is marked as lookup or inferred. Since neither edge from openpredict has a support graph, they're shown as lookups. Then when, in the case of the 2nd edge, two identical predicted edges are merged, the edge from openpredict shows as inferred because the BTE edge with which it was merged has a support graph.

Something needs to change with how the UI handles these edges, since the assumption inferred edges always have a support graph is false, but my question is: should we show predicted edges at all if they have no support graph? These openpredict edges also have no EPC, making them doubly dubious in my opinion.

I could adjust the UI to display all edges with a KL of predicted as inferred regardless of whether they have support graphs, but I worry that such inferred edges will be inherently much less valuable (to the point of being potentially worthless) than inferred edges with support graphs.

sierra-moxon commented 3 months ago

adding @mbrush for his comments on this.

mbrush commented 3 months ago

@CaseyTa when we last spoke OpenPredict was planning to add support graphs of some kind to their predictions - is this in the works still? If so, ETA?

IMO, even in the absence of support graphs these should be returned as inferred results, as support paths are pending. The UI does provide a linkout to an overview of the OpenPredict methodology - which should be sufficient to explain how the edge was generated.

@CaseyTa I would advise that until these edges do come with support paths, a line can be added to the wiki description explaining why, and that these are coming soon. This line can be removed again once support paths are added.

Genomewide commented 3 months ago

We are going to add an additional rule that 'inferred' means predicted but with support paths. This will distinguish the two predicted examples above and keep them from being merged.

sstemann commented 3 months ago

so now we will have inferred paths without support graphs?

dnsmith124 commented 3 months ago

@sstemann I'm of the mind that we shouldn't do that (temporarily or not), given the current state of the knowledge sources we link out to, but I believe that's what's being proposed.

sierra-moxon commented 3 months ago

from TAQA:

UI doesn't pay attention to whether an edge is marked as inferred to display inferred, instead it looks for a support graph. In this case, BTE had a support graph so it is showing correctly. In general, edges will merge as lookups unless they are trying to merge with another edge with a support graph.

We really need to discuss this; "inference" is confusing, we may not want to rely on "support graphs" to determine "inference" category. One-hops are really weak inference - those don't really lead to an inference.

from UI:

lookup or direct means there is one hop. the edge exists, evidence connects two edges directly. (lookup)
in creative mode, there doesn't have to be an edge, but the support graphs suggest there could or should have an edge between them (inferred)
these edges are treated differently because we're trying to show the user there is a different level of evidence here.
but now, we have 1-hop support graphs (though this still fits the paradigm because there isn't really a treats edge -- its a "treats or applied or studied to treat" and we want to infer that this is a treats edge (with "weaker" provenance than lookups). Maybe part of the confusion here is at what level the UI means inferred. I think we mean it at the reasoner level, not at the edge level.
in summary: the difference is evidence.
UI is treats "lookup" as "some resource is directly claiming the edge". Sui wants "lookup" to be the above and comes from a high-confidence, authoritative source.

feedback:

maybe a different word than lookup?
we also see TMKP in lookup paths - without support graphs - so they go to lookup.

from Sarah - this is a very specific issue though - this one shows "lookup" edges from OpenPredict! predict == inference, no? this is still an issue with definition and display of that definition to users. from Andy - but it's still a "lookup" from the UI POV.
from Sarah - but my definition is that lookup is a known treatment - it's an edge that I can look up in a graph, it's all about evidence. "known" is loaded. from Kara - can we change the predicate? CQS will infer the "treats" edge from these "treats" lookup edges, but ARAX does not infer an edge of treats here, without a support graph so it shows up in a lookup. from Andy - we need a better way of conveying predicate + evidence details to UI team - there is not enough hours in the day to figure all this complexity out.

action: we need to get CQS, ARAX, OpenPredict, UI, EPC together - this needs a discussion. next week CQS call.

sierra-moxon commented 3 months ago

from one-off meeting:

The predicates seem good here; we don't suggest changing them
We know there are a lot of edges that are missing KL/AT (or displaying "unknown" instead of "not_provided"?)
We need to find out if the large number of missing KL/AT values are considered the defaults in the infores catalog.
EPC group will analyze with the help of reports from Andy. (Matt will meet with Andy and figure out if there is any modeling that are needed to support this)
We know that the UI is in the middle of re-defining "inferred" vs. "lookup" vs. "direct". We don't know exactly what that looks like or if KL/AT will solve this, but we are hoping to divide results into three buckets (defined by KL/AT):
- "text mined"*
- "predicated"*
- "curated"*
please see the KL enumeration for specific bucket names.
If there needs to be modeling changes on KL/AT or predicates, OpenPredict can change, but sounds like won't need to change at the moment. It is unlikely that OpenPredict can create hypothetical support graphs to "explain" the predictions we are making before the end of this cycle; in particular, these seem difficult to assign, and not technically helpful in solving this specific issue (it would just make these results look the same as other ARA predictions).

sstemann commented 2 months ago

@dnsmith124 is this happening in the UI for Guppy?

Re-define "inferred" vs. "lookup" vs. "direct", display results into three buckets (defined by KL/AT) "text mined", "predicated", "curated"

Should it really be two tickets?

NCATSTranslator / Feedback

Re-define "inferred" vs. "lookup" vs. "direct", display results into three buckets (defined by KL/AT) "text mined", "predicted", "curated" #854