NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

All creative/inferred queries should also show lookup results (from all ARAs): lookup + creative cap that is independently maintained. #281

Closed sstemann closed 8 months ago

sstemann commented 1 year ago

Per TAQA at the RENCI June Relay, and Architecture https://github.com/NCATSTranslator/TranslatorArchitecture/pull/88

  1. Query Modes:
    1. As described in the TRAPI specification, edges may be queried in either "lookup" or "inferred" mode.
    2. KPs and ARAs must respond to lookup queries by treating the query as an exact database match
    3. ARAs must respond to inferred mode one-hops with relevant results beyond an exact database match; KPs may also provide this capability
sierra-moxon commented 1 year ago

from TAQA: aragorn/robokop does do this BTE does Unsecret does.

sierra-moxon commented 1 year ago

in architecture - we will merge the "requirement/rule" in and then close this issue.

sstemann commented 1 year ago

while i this the issue to enforce the requirement is merged, when do we expect to be able to verify which ARAs have met this requirement in Test?

dkoslicki commented 1 year ago

And to chime in on this: while BTE and ARAGORN report they are returning look ups in addition to creative mode queries, in practice, they are not. O&O work shows only about half of all disease queries result in one or more drugs returned having at least two ARA's supporting it (and just 4% of all drug-disease pairs having multiple ARAs supporting it). It may be that the known treats are "falling off the edge" of the results when ARAs truncate results.

andrewsu commented 1 year ago

It may be that the known treats are "falling off the edge" of the results when ARAs truncate results.

I'd have to look at specific examples to be sure, but this is definitely known behavior for BTE. We definitely do the lookup, but there is no guarantee that all the lookup results are included in our top 500. I don't think it's always a bug per se, but it is something that will become less frequent when we complete our solution to https://github.com/biothings/biothings_explorer/issues/634.

sierra-moxon commented 1 year ago

@capasfield - is this something that can find its way onto the architecture agenda? :) I think it can be closed, but we should probably check with folks directly first?

tursynay commented 1 year ago

Aragorn -Yes BTE - Yes ARAX - Working on it, Penn State is working on the issue Unsecret - Yes Improve - Challenge in ranking, but returning results

mbrush commented 1 year ago

We definitely do the lookup, but there is no guarantee that all the lookup results are included in our top 500.

This raises again the interesting question of how Results that are based on a lookup / assertion / known fact should be scored/ranked relative to Results that are not (i.e. are only supported by creative prediction). If our scores/ordering is meant to reflect confidence that the results is correct, I would argue that results based on looked up known facts should (almost) always score higher than those based only on creative inference/prediction.

If/how/where we factor into a Results final score the 'knowledge level' of edges on which it is based - is an important but overlooked question that has come up in several settings.

dkoslicki commented 1 year ago

By the way, as indicated here, ARAX has this fixed as of 3 weeks ago

sstemann commented 8 months ago

look ups are included with inferred results. it may not be all ARAs are including both but marking this done, can open specific ARA issues as needed.