Closed TranslatorIssueCreator closed 4 weeks ago
F2 and thrombin are targets validated in multiple databases and not coming as top answers. Instead top answers (with same scores than F2 and thrombin = ties) are only based on text mining evidence (less confident IMO).
It is not very clear why PLATELET GLYCOPROTEIN GPIIB-IIIA COMPLEX comes before than F2:
I'm not sure how the rank is calculated other than its from Appraiser. in the flattened Excel
Bivalirudin_test_2024_7-5_17_43_0e24a207-aa29-42ef-92c8-f3492d2cba4a.xlsx
objectNode_name | Platelet Glycoprotein GPIIb-IIIa Complex (MESH:D019039) | Platelet Glycoprotein GPIIb-IIIa Complex (UMLS:C0016011) | thrombin (UMLS:C0040018) | F2 (NCBIGene:2147) |
---|---|---|---|---|
rank | 2 | 3 | 4 | 5 |
sugeno_score | 1 | 1 | 1 | |
comp_confidence_score | 1 | 1 | 1 | 1 |
comp_novelty_score | 0 | 0 | 0 | 0 |
comp_clinical_evidence_score | 0 | 0 | 0 | 0 |
weighted_mean_score | 0.48 | 0.48 | 0.48 | 0.48 |
normalized_score | 100 | 96.8 | 18.8 | 100 |
ARAX_score | 1 | 0 | 0 | 0.93 |
unsecret_score | 0 | 0.99 | 0 | 0.17 |
improving_agent_score | 0 | 0 | 0 | 0 |
biothings_explorer_score | 0 | 0.34 | 0.21 | 0.86 |
aragorn_score | 1 | 1 | 1 | 1 |
@sandrine-muller-research questions are fundamentally questions of reasoning, so added that label. the ARA's are newer at MVP2 reasoning than MVP1 reasoning.
@MarkDWilliams to clarify the same question from several other issues - how come the ranks are sequential when the scores are the same. Or should we be asking @gaurav
Or should we be asking @gaurav
I know nothing about the ranking, so I can't help with that. But I know there are multiple F2 cliques (#764), which I'm hoping to close by Guppy. I'm going to remove myself from this ticket, but please add me back in if there's a node normalization issue not already captured by #764!
retested today
@sharatisrani F2 and thrombin do appear so the "reasoning" works although it should clearly be look up edges. In this issue I am trying to understand why F2 does not arrive first? The first result from UI is PLATELET GLYCOPROTEIN GPIIB-IIIA COMPLEX that is only supported by text mining edges while F2, which arrives in 4th position, is supported by 10 paths from which 3 are look ups.
@sstemann thank you so much for posting this table, really useful to study more in details. (in the UI it seems I have a slightly different ranking now.) and I agree with you I am unclear how the ranking is done in those conditions. Shouldn't it be the number of total paths supporting and/or if they is any look up edges?
this got tagged for Guppy, what is going to change to resolve this? @sierra-moxon
@sharatisrani the sequential ranks are due to the way that Rosina's scoring/ranking code deals with ties. The logic is all in the scoring.py file in the ARS code, but if you have questions about the algorithm, I believe Prateek wrote it and would be the best person to talk to.
@sandrine-muller-research i think then this is a question for @Rosinaweber and @pg427. in the table, it seems like F2 has more ARAs returning it.
from TAQA: retesting....
Re-running this Q via the UI, the results are still ~unchanged. So will go into the O&O tables and look at what's coming back and "debug" things, also knowing that some things will change (eg rank assignments). More this weekend.
OK as below, things are good. Closing the issue.
Thank you @sharatisrani Today's retest is here. Linked with issue #929 for ranking of top results.
The top result is now F2 as we would expect so issue is fixed. I retested the "other" bivaluridin (normalization issue) and it does not return anything. Tracking normalization issue in issue #932
Closing the issue.
Type: Bug Report
URL: https://ui.test.transltr.io/main/results?l=Bivalirudin&i=PUBCHEM.COMPOUND:16129704&t=4&r=0&q=e034c7d1-3f8e-473b-aceb-dbd9382fda7f
ARS PK: e034c7d1-3f8e-473b-aceb-dbd9382fda7f
Steps to reproduce:
What genes may be downregulated by:Bivalirudin?
Screenshots: