NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Reasoning Path of MVP2 > Increases Activity of Intermediate Node, not the Queried Node. #840

Open khanspers opened 2 months ago

khanspers commented 2 months ago

What chemicals may increase the activity of INSR? https://ui.test.transltr.io/main/results?l=INSR%20(Human)&i=NCBIGene:3643&t=1&r=0&q=ca06f95a-604d-4d6b-89bd-99297a508d9b

A couple of issues:

  1. Ranking Az628 has a score of 5 with just 2 sources listed as evidence, whereas Insulin has 120 publications as evidence and also has a score of 5. There are several other results on the first page with stronger evidence than AZ628, also with a score of 5.

  2. Az628 result

image

From Improving in ARAX:

image

If the interaction between INSR and SOX2 was reversed one could claim that ZA628 affects INSR. But the current graph doesn't show any evidence of AZ628 having an affect on INSR.

sstemann commented 2 months ago

i think this is a ticket for O&O - maybe at the next O&O we could review with everyone how the Sugeno is calculated, the components are weighted so at the consortium level we know why we see results like this

subjectNode_name URSOLIC ACID AZ628 Insulin pioglitazone crizotinib
subjectNode_id CHEBI:9908 CHEBI:91354 UMLS:C0021641 CHEBI:8228 CHEBI:64310
rank 2 2 3 4 5
sugeno_score 1 1 1 1 1
comp_confidence_score 1 1 1 1 1
comp_novelty_score 0 0 0 0 0
comp_clinical_evidence_score 0 0 0 0 0
weighted_mean_score 0.48 0.48 0.48 0.48 0.48
normalized_score 8.27 100 100 97.12 98.56
ARAX_score 0.24 0 0 0.97 0.98
unsecret_score 1 0 0 1 0
improving_agent_score 0 1 0 0.04 0
biothings_explorer_score 0 0 0 0 0
aragorn_score 0.94 0 1 0.84 1
ARA_list ['infores:ARAX_rtx-kg2', 'infores:aragorn', 'infores:unsecret-agent'] ['infores:improving-agent'] ['infores:aragorn'] ['infores:ARAX_rtx-kg2', 'infores:unsecret-agent', 'infores:improving-agent', 'infores:aragorn'] ['infores:aragorn', 'infores:ARAX_rtx-kg2']
ARA_count 3 1 1 4 2
sierra-moxon commented 2 months ago

@sharatisrani - can you please assign a sprint to fix this issue. If it is not doable before the end of this phase of translator, please tag it with the "Next phase" label. thank you :)

sharatisrani commented 2 months ago

This issue is another data point of the same issue as #845 Assigning it first to @brettasmi @suihuang-ISB, why did Improving score it so high, when it had only 2 pubs, and no clinical evidence. Following that we can figure out how confidence should handle cases like this. @dkoslicki @Rosinaweber

Rosinaweber commented 2 months ago

@sharatisrani @sstemann Note that none of the current confidence scores are aligned with the presented evidence. This is a problem that could only be fixed with the implementation of the EPC-based scoring, which would take longer than the current time we have left. Anyway, we are recording all these particular problems to make sure they are fixed with our preliminary re-engineering of the scoring by October.

sierra-moxon commented 1 month ago

from TAQA:

basically this is based on inference algorithm, likely because it is closely related to another node, Sui's team is working on removing some of these inference - they know there are too many of them, but its more a change to the weight of the inference (down-weight the inference instead of changing the algorithm).

lack of clinical evidence != no inference. does clinical evidence contribute to inference rules? Sui - it is being addressed; also addressing "too few" answers, this is a weird case. Addressing the more generic problem of over-ranking, too many inferred results. This release! :)

Next week talking about scoring trending to 1 - Sharat.

suihuang-ISB commented 1 month ago

Thanks Sierra for capturing the key points of the TAQA discussion yesterday in your comments. More details here since there actually are three issues here in this ticket:

[1] The ranking: In our case when an answer is 'inferred' the score reflects the "points" a node in the answer graph has collected in our random walk training of relevance on clinical data ... so it tells you about some confidence in the empirical inference and not in the correctness of an answer. And depending on some parameter, we tend to produce too many inferred answers - combine that with the artificial scaling and redistributing and other numerical magic, e.g. to avoid ties (which I have always been against, since ties are a facts of life), then we get these weird numbers. We are trying to fix the issue of too many inferred answers relative to "look ups".

[2] "Scoring high despite having only 2 pubs" : This is related. The score is not based on # pubs, but on clinical observation in one cohort in our PSEV training (currently: ~ 500k from UCSF, working on 2M from Seattle's Providence).

But this brings up a deeper point, a dirty secret that people without practical experience in biomedical research may not appreciate, but "is well known to those who know it well" : MEDICAL LITERATURE is NOT EVIDENCE! It captures researchers' collective ignorance, misconceptions, etc... Tyler once, early on, jokingly said that 80% of the medical papers are wrong. But this is no joke. And last year a group estimated that 24% of medical papers are plagiarized (or even fabricated). Sad, but a reality that we need to consider for EPC. To better understand these text-mining edges I have now randomly read dozens of papers of this example and recent similar ones that TMKP has provided as support and found that the vast majority do not qualify as scientific evidence. In addition, the fact that papers copy and influence each other (even without plagiarism) means that they are not independent, hence their number cannot quantify magnitude of evidential support. And many are in predatory journals... (which are now tagged) Possible solutions: (i) Weigh papers by IF? (not great idea, we all hate IF but better than nothing) ; (ii) Similar to taking care the problem of autocorrelation: If paper (A)'s assertion cites Paper (B), then these two papers should counted as one "effective" paper. (iii) As I have been advocating for a while: The only papers that are accepted as evidence, and are in fact the basis for standard of care guidelines in EBM (evidenced-based medicine), are META-ANALYSES. These are now declared as such in PubMed ([PT = Meta-Analysis]) - so should be easy to spot, dear TMKP!

[3] The strange support graphs with "inverted direction" of edges ("arrow points in wrong direction"). This is an old problem resurfacing and has to do with how we map both directionality and polarity of directed edges between biolink and SPOKE. Must be fixed!! and is fixable - working on it.

sharatisrani commented 1 month ago

I am tying point 2 from "Feedback from SME Peden, issue #849, here . As @suihuang-ISB explained on Jul 17 at the O&O, it is being fixed - how come Improving assigns such a high score to a result, when it is the only ARA that returns that result.

sstemann commented 3 weeks ago

what is being done on this ticket for Guppy?

sierra-moxon commented 1 week ago

from TAQA: we will restest (@sharatisrani - thank you!)

khanspers commented 1 week ago

Follow-up: After reading all the comments again, they seem to be all all addressing the ranking issue (why is Az628 ranked higher than Insulin?). However, the other issue is unrelated (I think?); the supporting graph from Improving for the inferred path from Az628 to INSR doesn't seem to actually support the conclusion that Az628 causes increased INSR. To summarize:

In Translator UI, the following is reported:

Az628 - causes increased expression of - SOX2 - has increased activity or abundance caused by - INSR

In ARAX, it looks like this:

Screen Shot 2024-08-30 at 3 16 34 PM

I just tested this again on both ui.test.transltr.io and ui.transltr.io, and the results are the same. In fact, many other top-scoring answers from Improving have the exact same pattern of interactions, with SOX2 in the middle: CH4987655, Refametinib, retinoic acid, PJ34, ulixertinib, cytochalasin B etc. None of these chemicals are returned by any other ARA. I see a comment from July 19 that maybe is related to this: "basically this is based on inference algorithm, likely because it is closely related to another node".

sharatisrani commented 1 week ago

The "strange" support graph arrow directions don't show strange in the UI (just in ARAX) so one can consider that not to be a problem [any more].

The "low-evidence" results are from Improving, and the explanation is above. Now it's up to Improving or Kristin to decide when/whether to close this issue.

Also the confidence scoring may be changed for Hammerhead (that decision is pending on why so many top-K results are failing with the new scoring). If that happens, the above problem will also go away without any change from Improving.

sstemann commented 1 week ago

I think the path that @khanspers points out is actually a reasoning issue and you do see it in both ARAX and UI UI.

Az628 Causes Increased Expression Of Sox2 image

Sox2 Has Increased Activity Or Abundance Caused By Insr image

The query was about INSR, not SOX2. So this path doesn't seem answer the question "What chemicals may increase the activity of INSR?"

It seems to say SOX2 has increased expression/activity caused by Az628/Insr.

suihuang-ISB commented 6 days ago

Thanks @khanspers for bring this up again. There are multiple layers of confusions here which squares the confusion. There is some possible erroneous data modeling issues in SPOKE, but also the inartful flipping of arrows by UI (combined with flipping from active to passive verb in the predicate) that adds confusion. Should not ne a reasoning error. We are sorting it out and we get back here!