NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Genomic Entity should not be a mixin in Biolink #586

Open sstemann opened 1 year ago

sstemann commented 1 year ago

Used the Test UI to run "What genes may be downregulated by:Metformin"

PK: 4cb4ba22-c60e-4652-ac9f-b2e8b7a2306a UI: https://ui.test.transltr.io/main/results?l=Metformin&i=PUBCHEM.COMPOUND:4091&t=4&q=4cb4ba22-c60e-4652-ac9f-b2e8b7a2306a

ARAGORN: image

ARAX: image

BTE: image

saramsey commented 1 year ago

Based on our team's investigation, the issue that is causing the validation error in the ARAX TRAPI result (the error, not the warnings) is that there are two result nodes, UMLS:C1709820 ("ROS1 wt Allele") and UMLS:C3890397 ("PRKAA1 wt Allele") both of which are annotated in the TRAPI result as having a category of biolink:GenomicEntity (which is not allowed because GenomicEntity is a "mixin" per the Biolink YAML model.

So, where is the "genomic entity" category coming from? Based on our investigation, we do not think it is coming from the current production RTX-KG2 graph (i.e., RTX-KG2.8.4c). But I notice that when I query the SRI Node Normalization service with this request:

https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC1709820&conflate=true&drug_chemical_conflate=false&description=false

I get back "Genomic Entity" in the response:

{
  "UMLS:C1709820": {
    "id": {
      "identifier": "UMLS:C1709820",
      "label": "ROS1 wt Allele"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C1709820",
        "label": "ROS1 wt Allele"
      }
    ],
    "type": [
      "biolink:GenomicEntity"
    ]
  }
}

The same goes for when I query with the other CURIE:

https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC3890397&conflate=true&drug_chemical_conflate=false&description=false

I get back a response with "Genomic Entity":

{
  "UMLS:C3890397": {
    "id": {
      "identifier": "UMLS:C3890397",
      "label": "PRKAA1 wt Allele"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C3890397",
        "label": "PRKAA1 wt Allele"
      }
    ],
    "type": [
      "biolink:GenomicEntity"
    ]
  }
}

Tagging and assigning @gaurav and the SRI crew on the node normalization issue.

saramsey commented 1 year ago

Since all signs point to SRI Node Normalizer service at this point (at least, as concerns the validation error), I'm deassigning myself for now. Also marking as bug. If something else comes up that needs our team's input, please LMK.

gaurav commented 1 year ago

Thanks so much for the detailed analysis, @saramsey! Yup, this is definitely a NodeNorm issue: the problem is that UMLS semantic type T028 "Gene or Genome" is mapped to the mixin biolink:GenomicEntity in the Biolink model. As long as a clique has a non-mixin semantic type as well, all will be well, but if a clique (like the ones in this ticket) only has T028 as a semantic type, we will give it a type of biolink:GenomicEntity and it will not have any parent types. I've opened a ticket in the Babel repository to figure out how widespread this problem is and whether simplify modifying the Biolink mapping will fix it, or if a more sophisticated fix is necessary (https://github.com/TranslatorSRI/Babel/issues/196).

saramsey commented 1 year ago

Thanks @gaurav. Maybe consider opening an issue in the github:biolink/biolink-model github project area. My $0.02: we need a concrete class that is a reasonable approximation of T028. The simple solution here is to just promote GenomicEntity to be a concrete class, and not a mixin. Seems like a pragmatic solution to me.

gaurav commented 1 year ago

That's a good idea! I've opened that issue at https://github.com/biolink/biolink-model/issues/1405

sstemann commented 8 months ago

@sierra-moxon @gaurav do you guys think this will change for lobster1?

sstemann commented 6 months ago

what about Octopus2? @sierra-moxon @gaurav

sierra-moxon commented 2 months ago

This is a fairly difficult change to the model as it represents a break in the ChemicalEntity/BiologicalEntity hierarchy. Moving to hammerhead.

gaurav commented 1 week ago

I don't think this made it into Hammerhead, so I think we should push this into the next phase.

sstemann commented 2 days ago

Hammerhead, Test

https://ui.test.transltr.io/results?l=Metformin&i=CHEBI:6801&t=4&r=0&q=e0eeddf5-b9fa-427a-bc75-406d10f85071

image