NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

grouping/filtering the intermediary nodes AND edge - Type 2 diabetes drug results are unsatisfying #261

Open flannick opened 1 year ago

flannick commented 1 year ago

Focusing just on the top result (insulin), which I think illustrates some of the issues:

  1. There are seven paths, some of which are correct, but from a user's perspective I don't think it makes sense to combine them. A path that says insulin treats diabetes (evidence: clinical trial) is different than a path that says insulin engages INSR which is associated with diabetes (evidence: studies of INSR) and is different than the path through IGF1R (which is not really the primary mechanism). I would like to be able to restrict the results so things are ranked according to one type of path, which would reflect the mechanism or reasoning strategy I believe in.
  2. I would also like to be able to filter to the type of underlying data; e.g. "genetics" or "clinical" or "molecular". Ultimately everything in between the data and the result (the ontologies, the reasoning, the scoring) are less trustworthy than the data. We should surface the data more and also allow filtering or query tuning based on it
  3. Insulin -> INSR -> polycystic ovary syndrome -> T2D is technically a path between insulin and T2D, but it is not the right mechanism here. I think it actually takes away from the answer -- by adding too many paths that each have different underlying meanings or models, it degrades trust in the answer. If we can't filter/group by "biological model", maybe we could hide the lower scoring paths from answers like this (where I assume the top path is overwhelmingly stronger than the other paths)
sierra-moxon commented 1 year ago

from TAQA: not many genetic sources coming back here immediately some results tagged with a too-high Biolink category (e.g NamedThing vs. Gene) would like non-pubmed source (e.g. GWAS) from Marc: could be a test vs. ci issue, but should be using biolink:Gene vs. biolink:NamedThing from Andy: although the 'NOT' can get complex in the facets, we could add "NOT FDA" easily.

https://ui.test.transltr.io/results?l=Diabetes%20Mellitus&t=0&q=500d73a9-d22d-479f-9cc2-7df5a5e5b0dd - rerunning on CI:

sierra-moxon commented 1 year ago

also in the results "insulin purified beef" from Improving Agent, also in ChEBML.

sandrine-m commented 1 year ago

The test ui link provided here is for diadetes miletus not type 2 (results are different from the 2 associated diseases): Here is the PK for type 2: a428459d-d840-4b20-a0c0-5e58016d8efa

sandrine-m commented 1 year ago

(1) Results are complete. I do not find INSR on the test environment. I think the current cap on the number of results is limiting the mechanistic reasoning capabilities (see issue #388 ).
This is current PK for CI (in case it appears there). @flannick which environment were you testing? Would you have the PK by any chance? (2) spliting the ticket (3) by "biological model", do you mean "biological mechanism" or "oganism"/"cellular model" etc.? I think the issue you are describing here is similar to issue #385 and others with connex results label. Do you think the action described in issue 385 solve (at least in part) your issue?

sandrine-m commented 1 year ago

@flannick please tell me if you would agree that 1 and 3 are part of a same bigger issue of grouping/filtering the intermediary nodes AND edges to surface disconnected mechanisms? (Otherwise, I can split the ticket even more)

sharatisrani commented 1 year ago

This today is a UI feature request, though it informs a more sophisticated scoring mechanism for the future.

sharatisrani commented 1 year ago

@flannick to inform whether he needs f() scoring validation on this, or it's some other problem.

sierra-moxon commented 7 months ago

from TAQA: we don;t think this is a show stopper, its a feature request w/re to grouping and we'll prioritize grouping along with other TACT priorities.

sierra-moxon commented 3 months ago

@sharatisrani -please close if this is unhelpful for the "grouping" discussion in O&O. thank you!

sharatisrani commented 1 month ago

The original issue is a year old, and since then pathways have come back to life in grouping and ordering - how to group them and filter on them is a live topic again. So I am reassigning to Jenn and Andy to take advantage of Jason's thoughts (original filing, above) to sort out how pathways will be grouped/handled in results. Also cc'ing Rosina who has made some novelty scoring proposals around genesets.

@Genomewide @Rosinaweber @flannick @jh111

sstemann commented 3 weeks ago

@sharatisrani can we close this in the Feedback repo if the O&O WG is addressing?