Currently the Dumper.dump() method accepts a syntax argument, this is sometimes called format in other frameworks, e.g. robot.
The problem here is that this is frequently ambiguous, especially in the context of OAK which is pluralistic and supports multiple ways of modeling and serializing ontologies, e.g. owl mapped to rdf serialized as rdf/xml; skos (natively rdf), serialized to turtle.
This is compounded with loaders, where we might want to use a suffix to guess the underlying model and serialization and choose the appropriate parser. Unlike the owlapi, rdflib requires the format of rdf to be known in advance (and in my experience this is a good thing - there is a lot of confusion caused by the owlapi cycling through multiple parsers and models).
Examples:
.owl clearly means the OWL data model, in the OBO universe this is conventionally mapped to RDF and serialized as RDF/XML, but outside this universe the serialization is more typically Turtle, and may not be an RDF serialization at all
.xml means OWL/XML as far as the OWLAPI is concerned, but rdflib uses this to mean RDF/XML (which is very different!)
.rdf might typically mean some kind of RDF serialization of OWL, but SKOS is valid for the ontology-like artefacts in OAK and it can also be serialized as .rdf. Same for the extended RDFS-like model used by schema.org. On top of this, again, we don't know if this means rdf/xml, rdf/turtle, n-quads...
On top of this, there are various aliases (e.g ttl vs turtle). Frameworks like pyoxigraphs use mime types to try and enforce some kind of standard but this seems overkill
Proposal:
Loaders and dumpers take an optional model argument
If absent, this is inferred using syntax and sensible defaults
We encourage (but do not mandate) bipartite file suffixes to reduce ambiguity and facilitate default arguments
The syntax for bipartitle syntaxes would be .model.syntax. For example, .owl.ttl, .skos.nt
There is a potential argument for a tripartite model here, because of owl mapping to rdf, and to reduce the ambiguity of .owl.xml. However, this is likely overkill.
Something like
Unambiguous OWL syntaxes
.owx
.ofn
.omn
Model optional, if specified, MUST be owl
OWL layered on RDF
owl.ttl (aka turtle)
owl.nt (aka ntriples)
owl.rdfxml (maps to xml syntax in rdflib)
owl.jsonld
Non-canonical
.owl.xml - discouraged, but default interpretation is .owx
.owl.rdf - discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle
.owl - discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle(?) (or: RDF/XML, as per OBO)
SKOS
.skos.{syntax}
As per OWL layered on RDF
OBO Format and OBOGraphs
TODO
Aliases
TBD: favor shorter form (i.e. suffix) as the canonical format name?
Currently the Dumper.dump() method accepts a
syntax
argument, this is sometimes calledformat
in other frameworks, e.g. robot.The problem here is that this is frequently ambiguous, especially in the context of OAK which is pluralistic and supports multiple ways of modeling and serializing ontologies, e.g. owl mapped to rdf serialized as rdf/xml; skos (natively rdf), serialized to turtle.
This is compounded with loaders, where we might want to use a suffix to guess the underlying model and serialization and choose the appropriate parser. Unlike the owlapi, rdflib requires the format of rdf to be known in advance (and in my experience this is a good thing - there is a lot of confusion caused by the owlapi cycling through multiple parsers and models).
Examples:
.owl
clearly means the OWL data model, in the OBO universe this is conventionally mapped to RDF and serialized as RDF/XML, but outside this universe the serialization is more typically Turtle, and may not be an RDF serialization at all.xml
means OWL/XML as far as the OWLAPI is concerned, but rdflib uses this to mean RDF/XML (which is very different!).rdf
might typically mean some kind of RDF serialization of OWL, but SKOS is valid for the ontology-like artefacts in OAK and it can also be serialized as .rdf. Same for the extended RDFS-like model used by schema.org. On top of this, again, we don't know if this means rdf/xml, rdf/turtle, n-quads...On top of this, there are various aliases (e.g ttl vs turtle). Frameworks like pyoxigraphs use mime types to try and enforce some kind of standard but this seems overkill
Proposal:
model
argumentsyntax
and sensible defaultsThe syntax for bipartitle syntaxes would be
.model.syntax
. For example,.owl.ttl
,.skos.nt
There is a potential argument for a tripartite model here, because of owl mapping to rdf, and to reduce the ambiguity of
.owl.xml
. However, this is likely overkill.Something like
Unambiguous OWL syntaxes
.owx
.ofn
.omn
Model optional, if specified, MUST be
owl
OWL layered on RDF
owl.ttl
(akaturtle
)owl.nt
(akantriples
)owl.rdfxml
(maps toxml
syntax in rdflib)owl.jsonld
Non-canonical
.owl.xml
- discouraged, but default interpretation is.owx
.owl.rdf
- discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle.owl
- discouraged, but default interpretation is OWL layered on RDF and serialized as Turtle(?) (or: RDF/XML, as per OBO)SKOS
.skos.{syntax}
As per OWL layered on RDF
OBO Format and OBOGraphs
TODO
Aliases
TBD: favor shorter form (i.e. suffix) as the canonical format name?
ttl
=turtle
rdfx
=rdfxml
nt
=ntriples