RobokopU24 / Feedback

Feedback on the ROBOKOP project
https://robokop.renci.org
0 stars 0 forks source link

Tutorial example query is broken #196

Closed karafecho closed 4 months ago

karafecho commented 5 months ago

This issue is to report that a user notified me that the example one-hop query that is posted in the tutorial is broken. I verified that it is not behaving as expected. This is a major issue due to the fact that not only is the query included in the tutorial, but it is also included in multiple slide decks that have been shared with users.

  1. The drop-down menu appears to be returning GO terms, e.g., GO:0019341 (also see screenshot). This is true for both 2,3,7,8-tetrachlorodibenzo-P-dioxin and tetrachlorodibenzo-P-dioxin.

image

  1. The CURIE that is posted in the tutorial (UMLS:C003965) no longer returns results. Name Resolver indicates that PUBCHEM.COMPOUND:15625 is the correct CURIE.
  2. The query runs when PUBCHEM.COMPOUND:15625 is entered for n0. However, neoplasm is no longer the top answer. That alone is not necessarily problematic, but (a) I'm no longer seeing the occurs_together_in_literature_with edge and (b) cancers are no longer as richly represented in the answer set, although there are now 10x more answers, so perhaps that's not surprising.
EvanDietzMorris commented 5 months ago

There are a few separate issues here:

  1. The custom name resolver hooked up to the UI is not returning the proper ID for "2,3,7,8-tetrachlorodibenzo-P-dioxin". You can see the normal one does here: https://name-resolution-sri.renci.org/lookup?string=2%2C3%2C7%2C8-tetrachlorodibenzo-P-dioxin

  2. We have a bad curie in the tutorial. As far as I can tell that UMLS does not exist. I think UMLS identifiers starting with C are Concept Unique Identifiers and will always have 7 digits. UMLS:C0039651 is Tetracyclines so maybe that's where it came from? https://www.google.com/search?q=UMLS+%22C003965%22 https://nodenormalization-sri.renci.org/get_normalized_nodes?curie=UMLS%3AC003965

  3. We must be running different queries somehow.. when I select that example query through the UI it does use PUBCHEM.COMPOUND:15625 to populate the query and I do get neoplasm as the top answer.. I don't see occurs_together_in_literature_with edges, but those come from omnicorp and not the robokop graph.

EvanDietzMorris commented 5 months ago

@karafecho do you have the text from previous TRAPI results for this query? It's going to be hard to track down where the occurs_together_in_literature_with edges went without knowing what they were before.

Edit: I see now that the tutorial has some screen shots of the now-missing omnicorp edge, but in general having the TRAPI queries/results in text is far more helpful for troubleshooting stuff like this.

karafecho commented 5 months ago

Re (2): I'm not understanding the issue with the CURIE. I tested every example that I put into the tutorial and elsewhere, so unless I somehow entered a typo, the CURIE that is currently in the tutorial should work. Plus, I know of at least two users who successfully used the tutorial to get started learning ROBOKOP. However, I guess it's possible that they were unsuccessful and simply didn't feel comfortable letting me know, or they didn't use the CURIe. All that said, see https://github.com/RobokopU24/Use-Cases/issues/1, which I created after I posted the tutorial. Maybe I did just enter the wrong CURIE in the tutorial?

Re (3), do you mean when you enter PUBCHEM.COMPOUND:15625, you get neoplasm as the top answer? But no occurs_together_in_literature_with edges?

Here's a couple of screenshots plus the JSON results. Unfortunately, I don't think I saved the original query/results, although I may be able to dig them up. I generally try to save results, but this one must have slipped through.

image

image

ROBOKOP_message(35).json

karafecho commented 5 months ago

I'm wondering if the missing OmniCorps edges are related to the CURIE issue?

EvanDietzMorris commented 5 months ago

2) I suspect the wrong curie just got put into the tutorial. I don't think that would've ever worked given it doesn't seem to be a real UMLS id and I couldn't find it in new or old version of robokopkg. The "sample query" from the drop down in the UI has PUBCHEM.COMPOUND:15625 and so did your use case link, so I suspect the bogus UMLS has only been in the tutorial.

3) Looks like we are running different queries, the sample query in the UI and the tutorial have associated_with not related_to, which returns neoplasm as the top result and explains why you're getting more results than expected.

re: omnicorp I think something is just broken with the omnicorp edges at the moment. I don't see them on any queries even when the logs show they are coming back from the service. I suspect versions of omnicorp and the Aragorn we have on robokop-u24 are out of sync and we need to upgrade to the latest versions, which I was planning on doing soon but now I'll bump it to high priority and try to get that done tomorrow and hopefully it'll fix that issue.

karafecho commented 5 months ago

(2) Yeah, that's what it's sounding like. Weird. But at least that's an easy fix.

(3) Ahhh, I didn't realize that I used associated_with. That resolves that issue.

Re OmniCorp: Thanks for troubleshooting. I'll bet you're right.

Assuming the OmniCorp issue is due to the syncing issue, then there are two remaining bugs:

  1. The typo in the tutorial. I can fix that.
  2. The custom name resolver hooked up to the UI is not returning the proper ID for "2,3,7,8-tetrachlorodibenzo-P-dioxin".
EvanDietzMorris commented 5 months ago

Agreed.. fingers crossed the omnicorp issue is easily resolved by upgrading versions. David is looking into the name resolver issue.

EvanDietzMorris commented 5 months ago

The omnicorp issue persists. The correct edges are being returned by Aragorn but something is preventing the UI from showing them. David is looking into this as well.

karafecho commented 5 months ago

Update re incorrect CURIE: https://github.com/RobokopU24/qgraph/issues/273.

Woozl commented 4 months ago

Sorry for the delay, there are a couple issues here. I think the missing node is actually missing from the underlying babel files. I need to see which version of the synonym files https://name-resolution-sri.renci.org/ is using, but the robokop nameres is using the latest set https://stars.renci.org/var/babel_outputs/2024jan9/synonyms/2024jan5/

However, 2,3,7,8-tetrachlorodibenzo-P-dioxin (PUBCHEM.COMPOUND:15625), doesn't seem to actually exist in that, based on this grep:

zgrep "curie\": \"PUBCHEM.COMPOUND:15625\"" /projects/stars/var/babel_outputs/2024jan9/synonyms/2024jan5/*.txt.gz

I need to ask Gaurav what versions we're running on the translator instance, which would hopefully clear this up.

EvanDietzMorris commented 4 months ago

In our devops repo it looks like 2023nov5 is the version currently deployed on https://name-resolution-sri.renci.org/ but we need to check with Gaurav to confirm.. if it's the case that 2,3,7,8-tetrachlorodibenzo-P-dioxin doesn't exist in the new files, but we just haven't deployed them to name resolution and node normalizer yet, that would explain this "bug" .. either way for the future we need to somehow make sure we use the same babel data that was used to normalize robokopkg for this instance of name resolution.. ORION saves the version of node normalization ie "2.3.5" from https://nodenormalization-sri.renci.org/openapi.json but doesn't know which version of babel that entails - let's discuss with Gaurav

Woozl commented 4 months ago

Newest nameres deployment is using 2023nov5 set which has the 2,3,7,8-tetrachlorodibenzo-P-dioxin node, so this specific issue should be resolved. Gaurav has opened an issue for the 2024jan5 set at https://github.com/TranslatorSRI/Babel/issues/242