INCATools / ontology-development-kit

Bootstrap an OBO Library ontology
http://incatools.github.io/ontology-development-kit/
BSD 3-Clause "New" or "Revised" License
214 stars 54 forks source link

Re-enable the LightRDF-based RDF/XML check. #928

Closed gouttegd closed 8 months ago

gouttegd commented 10 months ago

The ODK used to have a validation check that ensured that RDF/XML files were valid. The check was based on LightRDF, which was fast enough to make it reasonable to enable the check by default.

We had to disable that check because of a bug in LightRDF that caused the library to sometimes fail to parse perfectly valid RDF/XML files (#745). That bug has been fixed, so now we can re-enable the check by default.

The check-rdfxml script is now invoked by default, and by default is uses the fast LightRDF-based check. A new ODK option is added (extra_rdfxml_checks): when enabled, it instructs the check-rdfxml script to also perform the Jena- and RDFLib-based checks (which are not enabled by default as they are more time-consuming).

closes #892

gouttegd commented 10 months ago

PR blocked until the next time the Python package constraints are updated, so that the ODK gets the new version of LightRDF (either 0.3.2 or 0.4.0) in which the aforementioned bug has been fixed.

matentzn commented 9 months ago

Have you tried validating a bunch of release files like Uberon CL FBBT to see that any issues they might reveal are actually solvable?

gouttegd commented 9 months ago

Not sure I understand your question. As far as I know FBbt, Uberon, and CL have never produced invalid RDF/XML files. What potentially (un)solvable issues are you talking about?

matentzn commented 9 months ago

Concretely: Did you checked the that current OWL API serialiser produces valid RDFXML according the LightRDF for a variety of key ontologies?

gouttegd commented 9 months ago

No, but if the OWL API was producing invalid RDF/XML, I sure hope it would have been caught before by the OWL API’s own test suite. The goal here is not to check the OWL API, but to catch possibly invalid contents that the OWL API will not produce itself but that it may let pass (typically an invalid IRI).

I have not tested now, but no such contents was present in either Uberon, CL, or FBbt back when that check was first implemented in September 2022 and before it was removed because of the LightRDF bug 2 months later.

gouttegd commented 8 months ago

@matentzn Did you get to test that on -dev? It’s been fine working fine for me so far.

matentzn commented 8 months ago

I tested this with two ontologies, good to go!