RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 559 forks source link

owl:imports #960

Closed trypuz closed 2 years ago

trypuz commented 4 years ago

Is there any way to handle owl:imports in rdflib? E.g. I'd like to load FIBO ontology: https://spec.edmcouncil.org/fibo/ontology/master/latest/LoadFIBODev.ttl.

nicholascar commented 4 years ago

Hi @trypuz, what sort of behaviour would you like to see? They way owl:imports is handled in desktop tools like Protégé is to bring the imported ontology into memory and thus in the display of the tool.

Would you like to see rdflib, when loading a graph, follow owl:import statements and bring in the linked to ontologies?

I think this would be quite difficult! We would need to see a bunch of error handling and would the load be sequential or async with following lines of code, given that an import could take a long time?

Perhaps we could have a new parameter for the g.parse() function of a Graph that will do an import like this but the default is retrieve_imports=False.

tgbugs commented 4 years ago

This is not simple to implement because you have no idea what format the remote ontology will come in and rdflib can't parse things that come in owl functional syntax. I have an implementation of import handling here but it is a bit convoluted since it also dissociates the ontology metadata header from the rest of the file (to make fetching just the imports fast).

ashleysommer commented 4 years ago

I have a module in pySHACL called rdfutil, that is a collection of features built on top of rdflib, which are useful within the pySHACL project. It was structured in such a way it can be moved out of pySHACL into its own project if others might want to use it.

One of the features is an rdf loader, with smarts like auto-detecting source type (a io-object, text string source, a file path, from the web, etc), as well as attempting to detect RDF format (by file extension, or by content headers). It can import into an rdflib Graph, ConjunctiveGraph, and Dataset, and supports source files with potentially multiple graphs (like trig, and json-ld). More importantly here, it does support owl-imports in what I consider a logical way, is selected with a parameter, and is off by default (and doesn't support owl-functional).

See here: https://github.com/RDFLib/pySHACL/blob/ef171aa1dfa9148f88a1ce62e04311b67bfc2945/pyshacl/rdfutil/load.py#L46

nicholascar commented 4 years ago

But again is it a good idea to support owl:imports following?

Thanks @ashleysommer for the module but if following imports is to be supported, I would want to make sure it’s togglable and off by default, to prevent unexpected graph loading slowness

tgbugs commented 4 years ago

My view is that this is probably out of scope for core rdflib because there are a number of owl formats that it cannot consume and there is likely to be quite a bit of new code that has to be added and maintained. I would suggest that something like rdflib-jsonld aka rdflib-owl might be more appropriate given the similarity of the issues around retrieving transitive contexts and handling alternate formats.

hsolbrig commented 4 years ago

The nascent functional owl project might be useful. Its first incarnation (which we're hoping to have ready to go in the next couple of weeks) will support an owl functional syntax loader and owl functional and owl rdf serializers. At the moment it doesn't process import semantics but that might be a useful enhancement.... that said, it wouldn't know how to deal with Manchester, OWL XML or other variations

nicholascar commented 4 years ago

I think that is would be ok to include a non-default option to perform imports. If we can get other OWL formats handled, great (and I've chatted to @hsolbrig about this) but this shouldn't be strictly necessary: we just handle what we can, of the option to import is turned on.

The reason I'm keen to do this is that accross-the-web handling of Linked Data is difficult and rdflib could usefully allow you to do this, as long as it doesn't do it by detail. People could learn the positived and negatives of OWL imports by using such an option.