ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Illegal prefixes in /meta_knowledge_graph #125

Open cbizon opened 3 years ago

cbizon commented 3 years ago

e.g.

"biolink:Disease, DiseaseOrPhenotypicFeature": {
      "id_prefixes": [
        "MONDO",
        "PUBCHEM",
        "UMLSCUI",
        "UMLSCUOI",
        "LOINC",
        "OMIM",
        "ICD10",
        "ICD9",
        "HP",
        "SCTID",
        "CHEBI",
        "UMLSCUIC0023895",
        "NCIT",
        "MESH",
        "ICD10R",
        "SCITD",
        "UMLS",
        "MeSH"
      ]

Many of these are not allowed prefixes in the biolink model, including UMLSCUI, CUMLSCUIOI, etc.

cbizon commented 3 years ago

This is true for many other types as well.

karafecho commented 3 years ago

We completed a lot of identifier mappings prior to the adoption of standards by Translator. My understanding was that 'extra' prefixes weren't an issue?

cbizon commented 3 years ago

Many of the ones in the list I posted are clearly wrong either in formatting or type.

I am unsure whether it's legal to provide extras but if there is a reason that these are legal prefixes then they should be added to the model so that others know they are valid. Otherwise nobody will call you using them.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: karafecho @.> Sent: Monday, June 14, 2021 10:32:06 AM To: NCATS-Tangerine/icees-api @.> Cc: Bizon, Christopher A @.>; Author @.> Subject: Re: [NCATS-Tangerine/icees-api] Illegal prefixes in /meta_knowledge_graph (#125)

We completed a lot of identifier mappings prior to the adoption of standards by Translator. My understanding was that 'extra' prefixes weren't an issue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/NCATS-Tangerine/icees-api/issues/125#issuecomment-860732812, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACR7EHVVNBCJUCUXRBGAIADTSYHGNANCNFSM46S6YBDA.

cbizon commented 3 years ago

@patrickkwang is the key here valid? "biolink:Disease, DiseaseOrPhenotypicFeature" ?

patrickkwang commented 3 years ago

@cbizon No, the key is bad.

cbizon commented 3 years ago

Would the code in plater for creating /meta_knowledge_graph be helpful?

xu-hao commented 3 years ago

we have fixed some of the typo. still need to check if all identifiers are accepted by trapi now.

cbizon commented 3 years ago

There's still a lot of things that I'm sure won't validate. You can use the biolink model toolkit (or bl-lookup service) to easily find out what the allowed prefixes for a given concept are.

It might also make sense to validate annotation curies. It seems as though there are annotations that lack a ":" and are ending up creating new prefixes like "PUBCHEM123124". That's my guess anyway.

cbizon commented 3 years ago

Also, just to be clear, problems with meta_knowledge_graph are currently preventing strider from getting information from ICEES.

Take the query Asthma -[correlated with]-> chemical.

Right now, strider returns no ICEES results, and it will not until this is fixed.

The problem is not only the prefixes, but their order, which is meaningful.

Here is the id_prefixes for Disease:

"biolink:Disease": {
      "id_prefixes": [
        "ICD10R",
        "ICD10",
        "UMLS",
        "UMLSCUIC0023895",
        "OMIM",
        "UMLSCUI",
        "SCTID",
        "MONDO",
        "SCITD",
        "MESH",
        "ICD9",
        "NCIT",
        "CHEBI",
        "HP",
        "CPT",
        "PUBCHEM",
        "UMLSCUOI",
        "LOINC",
        "IC10"
      ]
    },

What this means is: ICEES wants you to send it ICD10R codes. If the concept doesn't have an ICD10R code, it wants you to send an ICD10 code, and if it doesn't have that, please send it a UMLS, etc etc.

According to nodenorm, here are the possible values for asthma:

https://nodenormalization-sri.renci.org/1.1/get_normalized_nodes?curie=MONDO%3A0004979

So, it doesn't have an ICD10R code (which is not a biolink allowed prefix anyway). But it does have an ICD10, ICD10:J45. Strider compares the nodenorm results to the meta_knowledge_graph results, and says "oh, ICEES wants this ICD10 code" and sends that query.

However, ICEES doesn't return any results for that ICD10 code. It does return results for the MONDO code. End result: no ICEES results for strider.

karafecho commented 3 years ago

@cbizon : Thanks for the detailed explanation. Very helpful! As it turns out, neither Hao nor I were aware that the order of the prefixes mattered. We will fix that. We're also still working to fix the CURIES.

Just so you are aware, we are in the process of overhauling the way we handle our API config files, so while the issues delineated here and elsewhere are taking a while to resolve, things should proceed much more smoothly moving forward.