controlling HTML rendering for large models

fennibay commented 1 year ago

Is your feature request related to a problem? Please describe.

When rendering large models (e.g. with lots of classes), we see that the resulting HTML is becoming too big, not easy to show on browsers, as well as causing issues in our rendering pipeline (specifically with mkdocs; treating the rendered result as a page that is included in a mkdocs based web application).

We'd like to have feature where we can minimize the HTML content, focusing on the high-level description ("header") of the model, while still keeping the full model content in RDF files.

Describe the solution you'd like

Introduce a new option -renderURIs (among the existing options)
This option will take as a parameter the URIs to be rendered // maybe just as a list on the command-line, or read from a txt file, or maybe even a construct sparql query that will return the related URIs.
Only these URIs will be rendered in the HTML.
In the call to Widoco we will specify the URIs that we deem to be relevant for the HTML using this new option.
No effect on model conversion to other RDF formats, the option will only effect the HTML rendering.
If the option is left out the whole model will be rendered (current behavior) to HTML

Describe alternatives you've considered

Splitting the model file into header and content, rendering the header to HTML and converting the header and content and separately. -> Having different files for the model is hard for model designer to judge upfront in which part something should be defined and creates additional handling afterwards to handle multiple files and to use them consistently.
Splitting the model file into header and content, rendering the header to HTML, combining both parts for the content for conversion. -> We need separate conversion programs and combining programs, HTML should link to the correct file.
Implementing a simple-renderer ourselves -> we'd need to duplicate some functionality in WiDoCo

Additional context

Attached you find an artificial example containing 10K subclasses with a base class: output.ttl.gz

@weissjoh

dgarijo commented 1 year ago

This behavior is not clear. What happens if one of the URLs to filter has subclasses? Should those be filtered then? If not, then the hierarchy would break. What about the domain and range of properties of where the domain or range should be filtered? What if you are trying to filter the URI of a property? Should the program then review the axioms in all other classes. I think the consequences of this request require some serious thinking: If you remove the terms to filter, you may significantly affect the model. If you remove only the classes but without their semantics then some work is needed. Perhaps if you have many models it is better to separate 1 html per class and render them separately.

fennibay commented 1 year ago

Thank you for the quick feedback.

You're right to point out that it is not clear what it means to render one URI.

I would precise it as follows: -renderURIs renders only the triples where the specified URIs occur as a subject.
The onus is then on the caller to give a larger list of URIs if they are interested in rendering more. The SPARQL query idea (construct or select would also work I think) could make the job of the caller easier.
For me specifically, I will only render the model URI as an entry point to documentation, so the simplest option of specifying on the command-line would be sufficient.

Alternatively, we could restrict renderURIs only to URIs of type owl:NamedIndividual and owl:Class, but that would couple it too strictly to OWL IMHO.

One HTML page per class would not solve my specific problem, because I really have lots of classes, with little information in each of them. Basically my model is a controlled vocabulary.

-renderURIs would affect only the HTML output, i.e. documentation. So the completeness of the model itself is not affected, only the documentation. Conceptually I don't see an issue rendering only part of the model to HTML, because typically a given model will contain references to other models, but it's practically sufficient to focus on one model.

dgarijo / Widoco

controlling HTML rendering for large models #578