INCATools / ontology-access-kit

Ontology Access Kit: A python library and command line application for working with ontologies
https://incatools.github.io/ontology-access-kit/
Apache License 2.0
118 stars 28 forks source link

Add adapter for UMLS NCBO-RDF, for loading RxNorm etc #427

Open cmungall opened 1 year ago

cmungall commented 1 year ago

BioPortal helpfully provides RDF for ontologies in UMLS, e.g

https://bioportal.bioontology.org/ontologies/RXNORM

It may seem like this can be directly loaded, since OAK can handle RDF/OWL, right? Unfortunately that is not the case, and this isn't surprising. RDF is incredibly general, OAK supports OWL which can be layered on RDF. However, the UMLS RDF in BioPortal is does not use a standard OWL encoding of an ontology, even if it uses some parts of the OWL vocabulary.

The main thing that needs transformed is mapping http://purl.bioontology.org/ontology/RXNORM/isa => rdfs:subClassOf (this has to be done on a per ontology basis)

Other things that would help would skos:prefLabel => rdfs:label, but this is more convention, and you can provide OAK with an SSSOM mapping file here.

Other conventions that would help would be to translate

umls:cui """C1252770"""^^xsd:string ;

To

skos:exactMatch umls:C1252770

(this is a bit beyond what can easily be specified in an SSSOM file)

If you want all of the rich relationships like

http://purl.bioontology.org/ontology/RXNORM/has_ingredient http://purl.bioontology.org/ontology/RXNORM/9863

Then these will need to be converted to SubClassOf R someValuesFrom

If you wanted to be up and running with these ontologies right now I would write a little standalone converter that would repair the TTL into OBO-conventional OWL, and then load that into OAK as any other ontology. The advantage here is you could just directly load this into sqlite for speed.

Another approach would be to do this directly from RRF. You could use the ncbo code to get started https://github.com/ncbo/umls2rdf -- but this looks like very old perl.

Longer term, we might want to write a specific adapter for this form of RDF, that would be driven by a mixture of declarative SSSOM mappings plus some bespoke code. But this would be best implemented after we make the sqlite loader pure python.

Yet another approach here is to simply use the ontoportal adapter in OAK. However, this is recommended for lookup operations, operations involving iterating over the whole ontology will have a high latency cost

cmungall commented 1 year ago

Short term hacky approach: https://github.com/INCATools/semantic-sql/blob/main/utils/ncbo2owl.pl