dgarijo / Widoco

Wizard for documenting ontologies. WIDOCO is a step by step generator of HTML templates with the documentation of your ontology. It uses the LODE environment to create part of the template.
Apache License 2.0
288 stars 88 forks source link

Parsing external owl entities (direct/indirect imports) + individual facts with external owl entites #668

Closed vChavezB closed 9 months ago

vChavezB commented 9 months ago

This solves #667 .

This PR allows to tag owl entities that are not part of the main ontology (i.e. direct/indirect imports) as an external property. With the owlapi an ExternalPropertyParser class is added which looks for the external property superscripts in the html generated by LODE and tries to find the owl entity based on the IRI.

The xslt extraction sheet was modified to add the external property superscript and class type-ep. In addition named individual facts which contain IRIs from imported ontologies are now added.

Example

Te assertions look as follows for the sample ontology I provided in #667

grafik

In this case the only assertion that was not found is foaf:membershipclass because I did not import foaf and I am using directly a definition that is not loaded in the ontology.

dgarijo commented 9 months ago

@vChavezB thanks for this contribution! The results look great I am wondering, is it possible to have this directly in the xslt transformation? It feels a little hackish to have some done in the xslt, and some added directly afterwards

vChavezB commented 9 months ago

The xslt transformation sheets only work with the data provided from an xml serialized ontology. In this case from the serialization provided here.

The transformation can not know the type of OWL object from the URI if the content is not provided in the xml. You would need a run-time environment to load the imports and find the missing URIS, which is what I propose with the ExternalParser. Perhaps you could do the same in xslt language but would require more effort as you need to retrieve the imports, load them as xml rdf and then do the parsing.

One thing that is missing is that the xslt transformation has a language file that provides the appropiate translation. This is obtained dynamically here with the xslt function getDescriptionLabel. What would need to be added to this PR is to load these same files (en.xml, de.xml, etc) to reproduce this functionality. At the moment I just add the title to the superscript in the english language such as here.

dgarijo commented 9 months ago

I see, thanks! However, if I don't recall incorrectly, there is an option to document not only the ontology, but the ontology + imports. If that's used, then that would address the type, no?

Or is it that the target ontology is not even imported, just reused, and therefore you don't even know?

vChavezB commented 9 months ago

I did a quick test with the option you mentioned (-includeImportedOntologies) and this solves the issue as the ontology is imported in xml rdf serialization, however the documentation becomes unreadable with imported definitions. For the example ontology I uploaded in issue #667, I get all the assertions from foaf.

grafik

However, for the use case I was thinking is when you dont want to import the ontology definitions in your documentation.

For example I am importing an ontology of units (qudt). This has around thirty thousand assertions for a vocabulary of units, which would pollute my documentation. In this case then I would not want to document them as the qudt organization already provides documentation in qudt.org.

Perhaps I could make this functionality available only when the user does not provide the option -includeImportedOntologies.

dgarijo commented 9 months ago

@vChavezB one question: are you importing the external vocabularies in your ontology? If you are not, then I agree with an external parser that bring in the information from the ontologies, but that would require downloading them, etc.

If the ontologies are imported but you don't want to pollute the doc, maybe we can have 2 loaded models (one for the ontology, one for the ontology + imported) and do the xslt transformation on the simple one and the xslt for the individuals on the complete one. Then, mix the individual section of one with the simplified documentation of the other.

I am brainstorming here, it's just that doing things outside the xslt still looks to me like a hard to maintain solution. And external properties would only be added for individuals, making inconsistent the rest of the documentation (e.g., if you are extending existing external classes or properties).

vChavezB commented 9 months ago

Just an update on the current PR.

are you importing the external vocabularies in your ontology?

Yes I am importing the external vocabularies (i.e., owl:imports).

maybe we can have 2 loaded models

That could also work. Just a minor detail I have found.

I noticed this while using an ontology with a vocabulary of units (qudt), which imports the main schema with object properties such as has unit, has quantity kind, etc. As these are not in the imported ontology but rather is an indirect import, the xslt transformation will be missing this information.

So for this alternative, all the imports should be recursively added and then use the information from this second serialized model. Either with another xslt transformation sheet (?) or a java implementation that does this.

From my point of view I find more practical working with the owlapi as there is no need create a second ontology, serialize it and then extract the metadata from the xml. With the owlapi I can just look for the owl entities and find their metadata and just add a tag external property to know which owl entities have to be looked for.

I am not against the other alternative you suggest but I will probably not have time to develop a second solution.

dgarijo commented 9 months ago

@vChavezB I understand. Let me review the PR and approve when I have bit of time. Thanks again for your contributions!

vChavezB commented 9 months ago

@dgarijo ok, let me see if I can create a test case so its easier to automatically check in the future.

vChavezB commented 9 months ago

I have added a test case which parses the generated html and asserts that the superscripts for the owl entities are correct.