buda-base / lds-pdi

http://purl.bdrc.io BDRC Linked Data Server
Apache License 2.0
2 stars 0 forks source link

Ontology service issues when using new ontologies tree (using imports) #124

Closed MarcAgate closed 5 years ago

MarcAgate commented 5 years ago

Ontology loading is based on github/owl-schema ttl files that are read into a model available for the ontology service. Jena builds Model based on ttl and calls ldspdi uris when this ttl includes owl:imports statements. (for instance reading admin.ttl in a Model implies -once the ttl is fetched from github- six additional requests to ldspdi):

owl:imports <http://purl.bdrc.io/ontology/adm/types/Access> ;
  owl:imports <http://purl.bdrc.io/ontology/adm/types/License> ;
  owl:imports <http://purl.bdrc.io/ontology/adm/types/OutlineType> ;
  owl:imports <http://purl.bdrc.io/ontology/adm/types/Status> ;
  owl:imports <http://purl.bdrc.io/ontology/adm/types/TermsOfUse> ;
  owl:imports bdo: ; 

FIRST ISSUE 1st consequence of this process: the ontology service cannot be initialized at ldspdi startup because it requires ldspdi to be already started.

SECOND ISSUE Now, suppose we have the ontology service initialization moved from ldspdi booting to a static clause in OntData class (i.e the Ontology service main class). Then if you want to browse one ontology (let's say ontology/admin), then you make a request to the ldspdi ontology service endpoint: the static init code is executed (the admin.ttl is fetched from github) and imports are processed through ldspdi ontology service endpoint which is not yet initialized (since we are in the process of doing so...)

I realized these two dead-ends this week-end after having hit the wall for tree days.

The consequence of this is that we have to redesign and rewrite the ontology service if we want to use imports of ontologies lying in our domain (ldspdi domain).

Moreover, we need to discuss the requirement of the jsp based "ontologies views" we want to have. Using the previous system, when displaying aut: we had only auth classes and props. Now, I am told that aut: must import adm: (apparently to comply with some shacl validation process), so the aut: view includes adm: and since adm: includes Types and bdo, we actually end up with a "global view" instead of a aut: view.

In general, If future shacl validation requires some ontology tweaking, I don't think this tweaking should be in the ontologies used for ontology service. I think these are two different use cases of ontologies and therefore we should have a different Ontology Model for shacl validation (based on the same ttl) available through a dedicated shacl endpoint on ldspdi. I am pretty sure we will need such an endpoint since validation cases are going to be quite numerous and sometimes specific. However, in all cases, and as far as I understand shacl validation, it should be enough to validate data against an Ontology Model aggregating all our ontologies.

eroux commented 5 years ago

I think the first issue can be solved by tweaking the imports using a FileManager and/or OntDocumentManager. The idea is the following:

When we read a file on github with Jena, it should ask the filemanager:

"I want to get the file at http://purl.bdrc.io/ontology/adm/types/License"

and the filemanager should say either:

I don't know much about shacl, I'll let you and Chris discuss this.

MarcAgate commented 5 years ago

Well, you cannot query ldspdi until ldspdi is started, that's the first issue.

eroux commented 5 years ago

yes, my solution does not involve querying ldspdi, the idea is that during model.read, Jena asks the FileManager to get the imported models, and doesn't query the url of the imported model directly

eroux commented 5 years ago

Another solution is to just disable automatic imports (through setDynamicImports(false)), I'm not sure we need to actually import the things

xristy commented 5 years ago

I also agree that the FileManager and OntDocumentManager are necessary ingredients.

I think shacl is a red-herring. Whether aut: imports an ontology that it depends on has nothing to with shacl, but rather proper specification things. Many of the prefixes that are defined are for well known ontologies/vocabularies and don't even have proper ontology services backing them at all. Try fetching dublin core based on the namespace:

curl --header "Accept: application/rdf+xml" http://purl.org/dc/elements/1.1/

or the bibframe namespace:

curl --header "Accept: application/rdf+xml" http://id.loc.gov/ontologies/bibframe/

In any event what are the functionalities of ldspdi w.r.t. ontology?

First, I would say that if a namespace is fetched then it should return be the document in the requested serialization. So that if we ask for:

curl --header "Accept: text/turtle" http://www.w3.org/2002/07/owl#

we get just the owl document with the import of rdfs:. Similarly if we:

curl --header "Accept: text/turtle" http://purl.bdrc.io/ontology/admin/

we should get the contents of admin.ttl without the rest of the content implied by the owl:imports

Second, there is browsing the entire union ontology or using that ontology for validation or whatever. This should be a separate issue, separate endpoints.

The ontology service by itself should be able to use the Jena FileManager and OntDocumentManager to load and cache the files and just return the requested serialization. That's really all that the basic ontology service needs to do to satisfy a request for a specific namespace of ours.

xristy commented 5 years ago

I think updating bdo:ontologySchema on Fuseki when there's a commit to owl-schema will require recursive importing unless Fuseki can be requested to do it.

MarcAgate commented 5 years ago

The ontology service has been rebuilt as follows:

Browsing and loading

We have now the ability to browse each ontology separately. Each ontology is loaded separetly without importing sub-ontologies. Main ontologies homepage (admin, bdo, auth) are directly accessible from the dropdown list. Secondary (imported) ontology homepages are available from the home page of the importing ontology (see [http://purl.bdrc.io/ontology/core/])(http://purl.bdrc.io/ontology/core/).

Updating with callbacks

Beside "attaching" each ontology to its own simple model (i.e without imports) the ontology service is also maintaining a model compounding all ontologies in one graph. This "global" model is used for serving (browsing) any uri that is not a base Uri (a prefix namespace). It is also used for updating fuseki each time a change occurs on owl-schema github repository, in which case the model is inferred before being pushed to fuseki.

Serving ontologies files

A serialization using any jean supported language is available for each valid individual (and single) ontology using command like this one:

curl -v -H "Accept:text/turtle" http://purl.bdrc.io/ontology/types/Transliteration

MarcAgate commented 5 years ago

All fixes in commit 0070f61

eroux commented 5 years ago

nice thanks a lot! does the webhook still work fine?

MarcAgate commented 5 years ago

Yes, we tested it yesterday with Chris !

eroux commented 5 years ago

cool, excellent!