INCATools / ontology-access-kit

Ontology Access Kit: A python library and command line application for working with ontologies
https://incatools.github.io/ontology-access-kit/
Apache License 2.0
117 stars 28 forks source link

Dumpers and loaders: Separate concept of syntax and datamodel #687

Open cmungall opened 10 months ago

cmungall commented 10 months ago

Currently the Dumper.dump() method accepts a syntax argument, this is sometimes called format in other frameworks, e.g. robot.

The problem here is that this is frequently ambiguous, especially in the context of OAK which is pluralistic and supports multiple ways of modeling and serializing ontologies, e.g. owl mapped to rdf serialized as rdf/xml; skos (natively rdf), serialized to turtle.

This is compounded with loaders, where we might want to use a suffix to guess the underlying model and serialization and choose the appropriate parser. Unlike the owlapi, rdflib requires the format of rdf to be known in advance (and in my experience this is a good thing - there is a lot of confusion caused by the owlapi cycling through multiple parsers and models).

Examples:

On top of this, there are various aliases (e.g ttl vs turtle). Frameworks like pyoxigraphs use mime types to try and enforce some kind of standard but this seems overkill

Proposal:

The syntax for bipartitle syntaxes would be .model.syntax. For example, .owl.ttl, .skos.nt

There is a potential argument for a tripartite model here, because of owl mapping to rdf, and to reduce the ambiguity of .owl.xml. However, this is likely overkill.

Something like

Unambiguous OWL syntaxes

Model optional, if specified, MUST be owl

OWL layered on RDF

Non-canonical

SKOS

As per OWL layered on RDF

OBO Format and OBOGraphs

TODO

Aliases

TBD: favor shorter form (i.e. suffix) as the canonical format name?

balhoff commented 10 months ago

.owl - discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle(?) (or: RDF/XML, as per OBO)

Most packages I'm familiar with assume that .owl is RDF/XML (e.g., Jena, Blazegraph).