NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

imProving Agent returning results with drug names like "Pubchem.compound:6710690" instead of "Pharmakon1600-01504273" with lots of examples like this. I.e. CURIEs being returned, not names #568

Open TranslatorIssueCreator opened 1 year ago

TranslatorIssueCreator commented 1 year ago

Type: Bug Report

URL: https://ui.transltr.io/main/results?l=VPS13B%20(Human)&i=NCBIGene:157680&t=1&q=9c390038-73fc-4d7d-8c63-c08d85eda8b0

ARS PK: 9c390038-73fc-4d7d-8c63-c08d85eda8b0

Steps to reproduce:

Search for drugs that upregulate VPS13B

Screenshots:

gglusman commented 1 year ago

A better example: "Pubchem.compound:151537" instead of "4'-Epidoxorubicin (hydrochloride)". Or: "Pubchem.compound:9841834" instead of "Istaroxime".

sandrine-muller-research commented 1 year ago

I queried the compound Pubchem.compound:151537 through NodeNorm endpoint and found that the label for this compound (decided by NodeNorm) is : "(7S,9S)-7-[(2S,4S,5R,6S)-4-amino-5-hydroxy-6-methyloxan-2-yl]oxy-6,9,11-trihydroxy-9-(2-hydroxyacetyl)-4-methoxy-8,10-dihydro-7H-tetracene-5,12-dione" So here is my guess of what is happening but I'll need confirmation from the UI team @dnsmith124 : when the label is too long, the decision was made to show the ID. Here the user is asking whether another rule could be used?

The second issue here is NodeNorm choosing not the optimal label. This is a known issue.

dnsmith124 commented 1 year ago

@gprice1129 can you speak to whether the backend is doing this with the names? The UI's frontend simply displays the names provided, and in the results I'm seeing from the example the 'Pubchem' terms are being given as the names for these results.

Genomewide commented 1 year ago

@sandrine-muller-research where does your preferred name come from?

gprice1129 commented 1 year ago

@dnsmith124 @sandrine-m @MarkDWilliams the backend just takes the names we are given by the ARS. The ARS should be converting these names from CURIEs to whatever name is decided as the "best" one by NodeNorm.

sandrine-muller-research commented 1 year ago

@Genomewide from NodeNorm PROD endpoint : image @MarkDWilliams does ARS make something on top of NodeNorm to decide the best label for the compound?

sstemann commented 6 months ago

this still happens, i dont know if there is a solution @gaurav https://ui.test.transltr.io/main/results?l=VPS13B%20(Human)&i=NCBIGene:157680&t=1&r=0&q=d9bc14f5-c11a-4625-aef7-1ed76c3f7179

gaurav commented 2 months ago

Here's how we're doing on NodeNorm CI:

I'm tracking non-good preferred names in this spreadsheet as well as https://github.com/TranslatorSRI/Babel/issues/306, but that work won't help these two cliques, because none of the other identifiers have a good label for this identifier. So we will probably need to pull in additional sources of labels and identifiers to fully fix this. I'm going to come back to this in Hammerhead, but unless there's a good source we're missing this will likely go unfixed this year.

(There's another ticket where we're discussing other solutions, such as having the UI display the CURIE -- "PUBCHEM.COMPOUND:151537" instead of "(7S,9S)-..." -- see https://github.com/NCATSTranslator/Feedback/issues/759)

sstemann commented 2 months ago

it still happens a lot and is most obvious on a new query, since Improve and Unsecret are the fatest to return

image