NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Combinatorial explosion in the number of answers returned to a query #33

Closed karafecho closed 5 months ago

karafecho commented 2 years ago

This issue is to formally report a known Translator issue, namely, a tendency for answer sets to explode combinatorially with certain types of queries.

For instance, during the October 2022 QotM, Translator team members found that moving from connections between ATP1A3 and chemical entities or diseases yields a reasonable number of results; however, when adding in intermediary genes and pathways, the answer sets explode and become unmanageable.

Example from comment posted by @colleenXu here:

"Not sure how to get from ATP1A3 -> related genes -> ChemicalEntity, Procedure, Treatment in a way that doesn't explode / become unmanageable

Pathways / BiologicalProcessOrActivity...caused explosions since they were linked to pathways that had lots of genes"

sierra-moxon commented 1 year ago

From TAQA:

from Sharat: four issues to be broken down into

big picture -> deep dive is important

from Chris B: perhaps another issue here is: this is a known query with many results - sorry. Or, can we filter/sort our way out of this one? - is this doable? Suggest to the user that they tighten this up. Here are the common predicates associated with the answers you're getting back, can we try to help the user write a better query?

from Sharat: agree; this is the best we can do, here are ways to tighten it up.
work on user workflows for two big queries.
Andy is interested in more brainstorming on this; UI needs direction. In the end, there is one place where the quality of the results is the measurement (either UI/O&O or someplace we could get it).

Andrew: Big "hub" nodes are taken into account in the Normalized Google Distance (which is used in scoring by BTE and ARAX) - this is a tunable parameter.

sharatisrani commented 1 year ago

For the case of grouping records, O&O has a tracking issue at https://github.com/NCATSTranslator/Ordering-Organizing/issues/15, with a few additional comments. For the case of scoring records, O&O has a tracking issue at https://github.com/NCATSTranslator/Ordering-Organizing/issues/6

sharatisrani commented 1 year ago

This is a major issue, but how likely is it to bite us for the September release?

karafecho commented 1 year ago

I think this is being addressed as described here and recorded here.