NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Example of issue with ranking (O&O): MVP2 Bivalirudin #765

Closed TranslatorIssueCreator closed 4 weeks ago

TranslatorIssueCreator commented 5 months ago

Type: Bug Report

URL: https://ui.test.transltr.io/main/results?l=Bivalirudin&i=PUBCHEM.COMPOUND:16129704&t=4&r=0&q=e034c7d1-3f8e-473b-aceb-dbd9382fda7f

ARS PK: e034c7d1-3f8e-473b-aceb-dbd9382fda7f

Steps to reproduce:

What genes may be downregulated by:Bivalirudin?

Screenshots:

sandrine-muller-research commented 5 months ago

F2 and thrombin are targets validated in multiple databases and not coming as top answers. Instead top answers (with same scores than F2 and thrombin = ties) are only based on text mining evidence (less confident IMO).

sandrine-muller-research commented 5 months ago

It is not very clear why PLATELET GLYCOPROTEIN GPIIB-IIIA COMPLEX comes before than F2: image

sstemann commented 2 months ago

I'm not sure how the rank is calculated other than its from Appraiser. in the flattened Excel
Bivalirudin_test_2024_7-5_17_43_0e24a207-aa29-42ef-92c8-f3492d2cba4a.xlsx

objectNode_name Platelet Glycoprotein GPIIb-IIIa Complex (MESH:D019039) Platelet Glycoprotein GPIIb-IIIa Complex (UMLS:C0016011) thrombin (UMLS:C0040018) F2 (NCBIGene:2147)
rank 2 3 4 5
sugeno_score 1 1 1
comp_confidence_score 1 1 1 1
comp_novelty_score 0 0 0 0
comp_clinical_evidence_score 0 0 0 0
weighted_mean_score 0.48 0.48 0.48 0.48
normalized_score 100 96.8 18.8 100
ARAX_score 1 0 0 0.93
unsecret_score 0 0.99 0 0.17
improving_agent_score 0 0 0 0
biothings_explorer_score 0 0.34 0.21 0.86
aragorn_score 1 1 1 1
sharatisrani commented 2 months ago

@sandrine-muller-research questions are fundamentally questions of reasoning, so added that label. the ARA's are newer at MVP2 reasoning than MVP1 reasoning.

@MarkDWilliams to clarify the same question from several other issues - how come the ranks are sequential when the scores are the same. Or should we be asking @gaurav

gaurav commented 2 months ago

Or should we be asking @gaurav

I know nothing about the ranking, so I can't help with that. But I know there are multiple F2 cliques (#764), which I'm hoping to close by Guppy. I'm going to remove myself from this ticket, but please add me back in if there's a node normalization issue not already captured by #764!

sandrine-muller-research commented 2 months ago

retested today

@sharatisrani F2 and thrombin do appear so the "reasoning" works although it should clearly be look up edges. In this issue I am trying to understand why F2 does not arrive first? The first result from UI is PLATELET GLYCOPROTEIN GPIIB-IIIA COMPLEX that is only supported by text mining edges while F2, which arrives in 4th position, is supported by 10 paths from which 3 are look ups.

@sstemann thank you so much for posting this table, really useful to study more in details. (in the UI it seems I have a slightly different ranking now.) and I agree with you I am unclear how the ranking is done in those conditions. Shouldn't it be the number of total paths supporting and/or if they is any look up edges?

sstemann commented 1 month ago

this got tagged for Guppy, what is going to change to resolve this? @sierra-moxon

MarkDWilliams commented 1 month ago

@sharatisrani the sequential ranks are due to the way that Rosina's scoring/ranking code deals with ties. The logic is all in the scoring.py file in the ARS code, but if you have questions about the algorithm, I believe Prateek wrote it and would be the best person to talk to.

sstemann commented 1 month ago

@sandrine-muller-research i think then this is a question for @Rosinaweber and @pg427. in the table, it seems like F2 has more ARAs returning it.

sierra-moxon commented 4 weeks ago

from TAQA: retesting....

sierra-moxon commented 4 weeks ago
Screen Shot 2024-08-30 at 9 18 21 AM
sharatisrani commented 4 weeks ago

Re-running this Q via the UI, the results are still ~unchanged. So will go into the O&O tables and look at what's coming back and "debug" things, also knowing that some things will change (eg rank assignments). More this weekend.

OK as below, things are good. Closing the issue.

sandrine-muller-research commented 4 weeks ago

Thank you @sharatisrani Today's retest is here. Linked with issue #929 for ranking of top results.

The top result is now F2 as we would expect so issue is fixed. I retested the "other" bivaluridin (normalization issue) and it does not return anything. Tracking normalization issue in issue #932

sharatisrani commented 4 weeks ago

Closing the issue.

sstemann commented 4 weeks ago

this is in Prod with Fugu

https://ui.transltr.io/results?l=Bivalirudin&i=CHEBI:59173&t=4&r=0&q=535628d2-58ed-4699-a659-c397ffd0548c

image