INCATools / ontology-access-kit

Ontology Access Kit: A python library and command line application for working with ontologies
https://incatools.github.io/ontology-access-kit/
Apache License 2.0
123 stars 29 forks source link

FHIR dumper updates #369

Open cmungall opened 1 year ago

cmungall commented 1 year ago

Overview

Followed from:

We have a preliminary dumper: https://github.com/INCATools/ontology-access-kit/releases/tag/v0.1.57

Bugs

Bug list

Bug details

1. CLI doesn't work; Only the Python API

If you try running via the CLI, using inputs of either (a) OWL, or (b) Obographs JSON, it doesn't work. You'll get errors, for example as seen below.

1.1. AttributeError: 'NoneType' object has no attribute 'version'

Example command: python src/oaklib/cli.py --stacktrace -i ~/Desktop/go-nucleus.json dump -o ~/Desktop/go.json -O fhirjson

Issue is that source.meta = None. However it's supposed to be an object, w/ a version attribute. When I run using the Python API, it works fine. I'm using the OboGraphToFhirJsonConverter or OboGraphToFhirNpmConverter directly. However, I see that via the CLI, it uses pronto_implementation.py, even when I'm passing an (obographs) JSON input. Perhaps it has something to do with that.

Trace

``` python src/oaklib/cli.py --stacktrace -i ~/Desktop/go-nucleus.json dump -o ~/Desktop/go.json -O fhirjson Traceback (most recent call last): File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/cli.py", line 5510, in main() File "/Users/joeflack4/Library/Caches/pypoetry/virtualenvs/oaklib-VudrcZnD-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/Users/joeflack4/Library/Caches/pypoetry/virtualenvs/oaklib-VudrcZnD-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/joeflack4/Library/Caches/pypoetry/virtualenvs/oaklib-VudrcZnD-py3.9/lib/python3.9/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/joeflack4/Library/Caches/pypoetry/virtualenvs/oaklib-VudrcZnD-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/joeflack4/Library/Caches/pypoetry/virtualenvs/oaklib-VudrcZnD-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/cli.py", line 2352, in dump impl.dump(output, syntax=output_type, **kwargs) File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/implementations/pronto/pronto_implementation.py", line 590, in dump super().dump(path, syntax, **kwargs) File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/interfaces/dumper_interface.py", line 73, in dump converter.dump(ogdoc, target=path, **kwargs) File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/converters/obo_graph_to_fhir_converter.py", line 100, in dump cs = self.convert( File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/converters/obo_graph_to_fhir_converter.py", line 186, in convert self._convert_graph( File "/Users/joeflack4/projects/ontology-access-kit/src/oaklib/converters/obo_graph_to_fhir_converter.py", line 229, in _convert_graph target.version = source.meta.version AttributeError: 'NoneType' object has no attribute 'version' ```

Improvements

Sub-task list

Sub-task details

1. Documentation

2.1 Edges: simple

2.2 Edges: more

3. Mapping to codes: use CURIEs or CURIE local parts?

Joe: a. The consensus seems to be to use valueCoding (coding) data type for these. These expect CodeSystem URI, not CURIEs. b. We could use valueString instead and use CURIEs. If so, where to put the curie_map/prefix_map? Expanding would be impossible without adding additional server functionality. I know that the FHIR RDF group is working on something CURIE related, but I don't know the details.

Also, what to use for the URI? The canonical ontology URI? Or the URI to a copy of the local CodeSystem on the FHIR server (if exists).

Do we want all mapped terms to be populated into another CodeSystem resource as discussed in the "Foreign concepts -> External code systems" section?

4. Ontology metadata

Ontology metadata is incomplete.

5. Synonym updates

More information in comments.

6. Foreign concepts -> External code systems

Joe: The FHIR RDF group and others recommended that, if we have any foreign concepts within an ontology (e.g. let's just say that Mondo had a GO class declared somewhere), all of these should be put into a separate CodeSystem resource JSON. I don't know how common this is at all.

Tricky though. It is one thing to want to populate concepts into another CodeSystem resource, but then we also have to have a way to extract other information about that ontology in order to populate the rest of the CodeSystem fields. Could do that by taking the URI to the ontology, downloading it, and then running it through the dumper as well. This would be then a recursive operation, as those ontologies may link to other ontologies.

7. Axioms

Joe: I haven't looked into this much yet; maybe all of this will be covered while working on edges.

8. Property/relationship/edge descriptions?

This is the most minor one here. CodeSystem.concept.property.description is optional.

Was thinking for example, for RO, could use OLS API to get. Can also include a link to the class's page (example).

9. ValueSets?

I'm not sure yet if this is something that would be desired, and what kinds of ValueSets might be constructed from a CodeSystem.

10. Simple CLI

Issues

Problem: We would like to run simple command runoak -i OWL_PATH dump -o OUTPATH -O fhirjson and for it to complete quickly. It takes like hours for things like Mondo because of rdflib. Solutions: a. Need to somehow convert to Obographs under the hood to speed up. I see that this requires some annoying changes in the CLI, since the CLI runs through main() before dump(), and main() will have no idea that it's getting FHIR, so it doesn't know to create an Obograph instead of use rdflib; that information only becomes available at the dump() phase, after conversion. b. Pre-convert to sqlite db and use that. Requires: https://github.com/INCATools/ontology-access-kit/issues/405

Options to support

i. include_all_predicates (bool): I was thinking a variation on this would be something like retention (with options, 'maximal', 'minimal', etc), but this is fine for now.

11. Filter

I'm not sure how in demand this would be, but aehrc/fhir-owl supports. Can read more in CodeSystem docs.

12. Labels -> concept.property

This is simple and potentially high value. Would enable text search. More info: https://build.fhir.org/codesystem-operation-find-matches.html

13. Non-is_a hierarchies

There is a need for hierarchies, defined at CodeSystem.hierarchyMeaning, to support other cases, e.g. part_of. We could add a flag --hierarchy-predicate and defaulting it to is_a. FHIR says it supports is-a, part-of, grouped-by, and classified-with. Supposedly not extensible.

14. Export in FHIR NPM package format

We need to follow what is specified here: https://confluence.hl7.org/display/FHIR/NPM+Package+Specification An example package: http://hl7.org/fhir/us/core/package.tgz

15. SPARQL / DL queries (-> ValueSet, etc)

The main idea here is that we would like to support the ability to extract ValueSets from the CodeSystem. This would probably be in the form of a DL or SPARQL query to select a subset of concepts, but there may be some other use case.

Related

joeflack4 commented 1 year ago

I made a dump of my/TIMS todo's, and broke down into sub-task list and sub-task details sections.

joeflack4 commented 1 year ago

@cmungall Regarding '4.2. synonym updates', I spoke with Shahim, learned a bit more, and updated the OP with this information:

Use oio as a code system or map to snomed codes for synonym types?

Questions

Which types to include? (e.g. exactMatch, broadMatch, etc)

Everyone I spoke to also said to only include exact matches for concept.use.designation, and put any other ones in concept.property, I believe. However we might want to double check. I can ask some FHIR people.

Which predicates to use?

SNOMED predicates required, but ontology's original synonym type can be included too.