biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
174 stars 71 forks source link

Biolink Categories and Node Norm Mappings #1156

Closed karafecho closed 1 year ago

karafecho commented 1 year ago

Question:

This issue is a follow-up from yesterday's Biolink HELP Desk call. I've resolved the main issue that prompted me to join the call; however, there are related issues that remain unresolved.

Essentially, I am hoping to solicit expert opinion from the Biolink team on several Node Norm Biolink category mappings that seem questionable to me.

For instance, in response to the input CURIE PUBCHEM.COMPOUND:241 (benzene'), Node Norm returns the following:

    ],
    "type": [
      "biolink:SmallMolecule",
      "biolink:MolecularEntity",
      "biolink:ChemicalEntity",
      "biolink:PhysicalEssence",
      "biolink:ChemicalOrDrugOrTreatment",
      "biolink:ChemicalEntityOrGeneOrGeneProduct",
      "biolink:ChemicalEntityOrProteinOrPolypeptide",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:PhysicalEssenceOrOccurrent"
    ],

In response to the input CURIE NCIT:C36251 (benzene exposure), Node Norm returns the following:

    "type": [
      "biolink:PhenotypicFeature",
      "biolink:DiseaseOrPhenotypicFeature",
      "biolink:ThingWithTaxon",
      "biolink:BiologicalEntity",
      "biolink:NamedThing",
      "biolink:Entity"
    ],

In both examples, biolink:EnvironmentalExposure is not returned. This makes sense for the first query, although it presents challenges for ICEES KG's meaning of 'benzene exposure'; however, it does not make sense (at least to me) for the second query. In fact, the output for the second query seems a bit strange to me.

To provide another example, here's a Node Norm response for UMLS:C0019993 (hospitalization):

{
  "UMLS:C0019993": {
    "id": {
      "identifier": "UMLS:C0019993",
      "label": "Hospitalization"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C0019993",
        "label": "Hospitalization"
      }
    ],
    "type": [
      "biolink:Activity",
      "biolink:ActivityAndBehavior",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:Occurrent",
      "biolink:PhysicalEssenceOrOccurrent"
    ]
  },
  "": null
}

Again, these mappings seem strange to me. In fact, in the ICEES KG config files, I mapped "hospitalization" to biolink:ClinicalIntervention.

Thoughts?

cbizon commented 1 year ago

Thanks @karafecho. Probably the best place for these topics is on the Babel repo. I'm going to open a couple of tickets there. My initial response is that the pubchem.compound one is correct in nodenorm, the benzene exposure one is probably wrong in nodenorm, and I'm not sure about the hospitilization one. I suspect that in that case, the assignment to a biolink class is being handled via a mapping in the biolink model to whatever the semantic type in UMLS is. (So maybe it ends up being a biolink issue after all!)

karafecho commented 1 year ago

Thanks, Chris. Actually, I thought about posting the issue to the Node Norm repo, but thought perhaps there was a reason why an entity such as hospitalization was being mapped to biolink:Activity. I would not have considered posting the issue to the Babel repo, btw.

I should note that the input CURIEs were derived from Name Resolver.

karafecho commented 1 year ago

I'll add that it might make sense to consider adding a new class and/or crafting more specific definitions for biolink:ChemicalExposure and biolink:EnvironmentalExposure.

sierra-moxon commented 1 year ago

@karafecho - hospitalization is now a child of clinical intervention in the model. chemical exposure and environmental exposure both have descriptions in the model now, though I won't claim that they couldn't use more refactoring. We likely need to spend some time in the exposures are of the model. But, in terms of this ticket, I think the work has been done. I will close now, but please of course feel free to reopen if there are further items to consider. :)

gaurav commented 4 months ago

This is still unchanged in NodeNorm:

{
  "UMLS:C0019993": {
    "id": {
      "identifier": "UMLS:C0019993",
      "label": "Hospitalization"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C0019993",
        "label": "Hospitalization"
      }
    ],
    "type": [
      "biolink:Activity",
      "biolink:NamedThing",
      "biolink:ActivityAndBehavior",
      "biolink:Occurrent",
      "biolink:PhysicalEssenceOrOccurrent"
    ]
  }
}

This is because UMLS:C0019993 "Hospitalization" has semantic type STY:T058 "Health Care Activity", which Biolink Model maps to biolink:Activity. The UMLS term for hospitalization doesn't get mapped to biolink:Hospitalization, and -- even if it was mapped in the Biolink model -- we don't use Biolink model mappings in Babel, so we wouldn't pick it up for this one UMLS term.

I would propose moving STY:T058 "Health Care Activity" from a narrow mapping to biolink:Activity to a narrow mapping to biolink:ClinicalIntervention. STY:T058 has subtypes of "Diagnostic Procedure", "Therapeutic or Preventive Procedure" and "Laboratory Procedure", which I think all fit within Biolink's concept of clinical interventions.