BFO-ontology / BFO-2020

A repository for BFO 2020 artifacts specified in ISO 21838-2:2020
68 stars 27 forks source link

Filename extension for the RDF/XML serialization is incorrect #92

Open ElisaKendall opened 4 months ago

ElisaKendall commented 4 months ago

Describe the bug The official media type (formerly called MIME type) for RDF or OWL files serialized in RDF/XML is 'rdf', not 'owl'. There is an OWL XML syntax that sometimes uses the 'owl' extension, thus the use of the wrong media type may cause problems for some parsers. Some tools 'help' the developer by adding ".owl" by default, which many ontologists and other OWL users may not realize.

See https://www.iana.org/assignments/media-types/application/rdf+xml for details. The complete list of media types is available at https://www.iana.org/assignments/media-types/media-types.xhtml.

alanruttenberg commented 4 months ago

There is widespread use of .owl to mean the RDF/XML file. See BioPortal and OBO Foundry. I've yet to see a tool that doesn't understand .owl. The XML format has a different extension: .owx I don't plan to change this.

ElisaKendall commented 4 months ago

@alanruttenberg That's of course your choice. Other folks do use .owl to mean the XML syntax for OWL as I mentioned, even if the suffix is supposed to be slightly different. The parsers I use are capable of detecting the difference (at least some of them). But, there are a number of folks working on ISO standards that depend on BFO, and for example, for work in pharma for the identification of medicinal products we have a rule about correct serialization. We can make an exception and call it out as non-standard in our mapping to manufacturing process ontologies that depend on BFO. The NIST / IOF effort is also making that exception. But it is an exception that would not be difficult to address in a new version.

There is what I believe is a bug in either the OWL API or Protege that automatically creates .owl as the extension of any ontology you create from scratch, unless you actively change it via the UI -- I suspect that the reason you see so many in the OBO Foundry and BioPortal is because that's the default in Protege. That doesn't mean it's correct. There are other things that are often included in RDF/XML serialized ontologies developed in Protege that could be improved as well. I've been talking with Mark Musen to work with Matthew et al to prioritize them. Hopefully the default will change and you'll see many more .rdf files than .owl files once that happens.

Other than due to historical reasons, is there a technical reason why you don't want to use the IANA media type? You already do that for Turtle, and should also consider publishing in JSON-LD, which is increasingly widely used. Also, there are lots of RDF tools that don't necessarily understand OWL but allow one to load OWL or use it in applications. Those tools typically expect a .rdf filename extension. BFO might be more widely used beyond the OWL DL / OBO Foundry community if the change is made.

alanruttenberg commented 4 months ago

The original OWL spec says:

2.3 MIME type ... As file extension, we recommend to use either .rdf or .owl.

The normative exchange format for OWL is RDF/XML

The BFO IRI is fixed and shouldn't change. It ends with .owl. It should be the case that the IRI by which an ontology is accessed is the same as the ontology IRI.

ROBOT documents the file extensions it uses and it uses .owl for RDF/XML and .owx for OWL/XML

The XML syntax document recommends use of .owx as file extension.

All the examples in the OWL 2 Structural Specification use .owl as the file extension.

As far as JSON-LD is concerned, please submit a separate issue for this request. I'm not familiar with JSON-LD but perhaps a future build process could produce it.

alanruttenberg commented 4 months ago

Given the widespread use of the .owl syntax and the mentions in the specifications, I would suggest submitting a bug report to developers of tools that malfunction when given a file or IRI with the .owl suffix.