NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Nameunknown Node #779

Closed sstemann closed 3 months ago

sstemann commented 4 months ago

I've seen this recently - nameknown nodes.

I'm guessing the special characters in the name may be an issue? Asking UI and Gaurev to take a look

https://ui.test.transltr.io/main/results?l=CACNA1A%20(Human)&i=NCBIGene:773&t=1&r=0&q=cd9ce523-b234-4b62-8396-0c9f98d54fef

Search > Nameunknown or sometimes its at the top image

link out to: https://pubchem.ncbi.nlm.nih.gov/compound/54625162 and https://pubchem.ncbi.nlm.nih.gov/compound/54625293

dnsmith124 commented 4 months ago

The name resolver returns a gateway timeout (in TEST at least) when searching for the name of that first pubchem link

sstemann commented 4 months ago

here's another example: Aceruloplasminemia https://ui.test.transltr.io/main/results?l=Aceruloplasminemia&i=MONDO:0011426&t=0&r=0&q=3d90aad3-67fc-482a-9a6f-09cad5b4ee14

sstemann commented 4 months ago

Ack here it is in the support graphs

image
gaurav commented 4 months ago

The name resolver returns a gateway timeout (in TEST at least) when searching for the name of that first pubchem link

Are you using the NameRes /reverse_lookup to get all the synonyms for the results? It should be faster to use NodeNorm's /get_normalized_nodes endpoint, and then you won't need to normalize the identifier before sending it to NameRes, e.g. https://nodenorm.test.transltr.io/1.4/get_normalized_nodes?curie=PUBCHEM.COMPOUND%3A54625162&conflate=false&drug_chemical_conflate=true&description=false (yes, it's a hideous name, but we don't have a better one for that chemical unless we want to fallback to the CHEMBL ID, which we currently filter out).

We have been noticing gateway timeouts on NameLookup Dev and are working on increasing the resources available to it. We might try to push that to CI soon, but it won't be part of this Prod release unless we need to accelerate it.

sstemann commented 4 months ago

5/30 after NN update in Test, nameunknown is appearing in the subgraphs:

image

is this still a timeout issue? @dnsmith124

https://ui.test.transltr.io/main/results?l=Aceruloplasminemia&i=MONDO:0011426&t=0&r=0&q=77003cf0-f9ff-4864-bc02-b89f2c216c82

dnsmith124 commented 4 months ago

@gaurav I should have specified, I was attempting to see in some way how the NR handled the special characters in one of the terms (((2S)-2-[(4S,5S)-5-[[1,3-benzodioxol-5-ylmethyl(methyl)amino]methyl]-8-[3-(dimethylamino)prop-1-ynyl]-4-methyl-1,1-dioxo-4,5-dihydro-3H-6,1$l^{6},2-benzoxathiazocin-2-yl]-1-propanol)), so I used /lookup on Test, that's what gave the gateway timeout. Inputing the curie (CHEBI:128184) into /reverse_lookup does not result in a timeout.

@sstemann the timeout issue could be a red herring, here's the results in the ARAX gui: Screenshot 2024-05-30 at 3 26 32 PM Looks like the NameUnknowns are coming from Improving

sstemann commented 3 months ago

i think the ones in the middle are from ARAGORN/pathwhiz. doesnt invoke much trust to have results nameunknown.

dnsmith124 commented 3 months ago

@sstemann completely agree. Since this is appearing across results from different ARAs and at a level above the UI, it would make sense that some tool that touches all the results to be providing "NameUnknown".

That label is also not a standard way of referring to an entity or label that has no value (at least as far as I'm aware), so it has to be explicitly provided by some tool.

@gaurav is "NameUnknown" a possible value that the Name Res or Node Norm can return? Trying to get to the bottom of where this label is coming from.

@MarkDWilliams any ideas?

gaurav commented 3 months ago

@gaurav is "NameUnknown" a possible value that the Name Res or Node Norm can return? Trying to get to the bottom of where this label is coming from.

It's not in the source code for either NodeNorm, NameLookup/NameRes or Babel. I'm checking to see if there's any synonym with "NameUnknown" as a synonym, but I doubt it. [Update: nope, not in any of our synonym files.]

cbizon commented 3 months ago

@MarkDWilliams when the ARS processes the ARA results is it adding NameUnknwon where there is no label?

I just ran the CACNA1A and there were only a couple of NameUnknowns. They looked like CHEMBLs that no longer exist in CHEMBL.

MarkDWilliams commented 3 months ago

This should resolve the issue and nodes for which we are unable to get a "proper" name will now have their CURIE assigned as their name instead of NameUnknown https://github.com/NCATSTranslator/Relay/pull/625

sierra-moxon commented 3 months ago

retested on test: https://ui.test.transltr.io/main/results?l=CACNA1A%20(Human)&i=NCBIGene:773&t=1&r=0&q=4f2bad81-fcdf-41c9-b392-7a453ebaf4c3

seeing Nameunknown in the results, likely this hasn't been deployed yet.

MarkDWilliams commented 3 months ago

Yes, I'm sorry I should have clarified that the PR should fix the issue in CI. We've put in the request to deploy it to Test, but it hasn't gone through yet.

sstemann commented 3 months ago

fixed in Test https://ui.test.transltr.io/main/results?l=Aceruloplasminemia&i=MONDO:0011426&t=0&r=0&q=a2b8ff98-8af7-4f47-a8a2-08f4159e7d0f

image