IHTSDO / snomed-owl-toolkit

The official SNOMED CT OWL Toolkit. OWL conversion, classification and authoring support.
Other
92 stars 19 forks source link

From UMLS CUIs to SNOMED OWL #72

Closed davidshumway closed 2 years ago

davidshumway commented 2 years ago

Is it possible to go from a list of UMLS CUIs which show a root source of SNOMED to a subset of the SNOMED ontology containing these CUIs? For example: (Salt water: C0337055, Coastal water: C0442542).

kaicode commented 2 years ago

Hi @davidshumway. I am not familiar with UMLS but this should be possible.

At a high level it would be a two step process:

  1. Convert from a set of UMLS CUIs to SNOMED CT SCTIDs.
  2. Extract a subset of axioms from the ontology

There ought to be a table / column in UMLS with the SNOMED identifier in SCTID format. These examples may help you find them: Salt water (substance): 46031004, Coastal water (environment): 257589004.

Once you have those you have a couple of options for extracting part of the ontology. You will need the SNOMED CT release in the standard RF2 format. This contains all the OWL axioms within the OWL Expression reference set snapshot file (for example SnomedCT_USEditionRF2_PRODUCTION_20220301T120000Z/Snapshot/Terminology/sct2_sRefset_OWLExpressionSnapshot_US1000124_20220301.txt).

You could pull out all the active rows from that file that match those SCTIDs in the referencedComponentId column. That will give you the axioms but won't make a nice ontology because the referenced concepts are likely to be missing. For example:

08ab3d15-f9bb-4ad3-bd40-eb08754aa01c    20190731    1   900000000000207008  733073007   46031004    SubClassOf(:46031004 :312440002)

This is the axiom for "Salt water (substance)" http://snomed.info/id/46031004 which states that the concept is a subtype of "Natural form of water (substance)" http://snomed.info/id/312440002. The axiom for the later concept could be fetched too but I don't have a script for this to hand.

Another way could be to generate the complete OWL ontology using this project (snomed-owl-toolkit), then copy the parts you need. That would also be a manual process but with the advantage that the OWL ontology would contain the rdfs:label and skos:prefLabel for each concept, rather than having to pull those out of the RF2 description and language reference set files.

The final alternative would be to generate a subontology of the set of concepts. This would automatically pull in other supporting concepts that are required to preserve the semantics of the signature set. There is a project to create SNOMED CT subontologies here: https://github.com/IHTSDO/snomed-subontology-extraction

I hope that gives you some options to think about for your use case. Let me know if I can help further.