diachron / quality

Dataset Quality Assessment (part of WP5 of the Diachron EU FP7 project)
MIT License
8 stars 4 forks source link

ValidOWL metric #36

Open clange opened 10 years ago

clange commented 10 years ago

Implement a metric ValidOWL (in the category of Intrinsic dimensions; Consistency dimension) that determines whether the given RDF dataset is a valid OWL ontology.

At the very least this metric should return a value of true or false.

In Jena it should be possible to try having an RDF graph parsed as OWL (which means that additional consistency rules are checked), and to obtain error messages if the RDF graph is not valid OWL.

After this basic step we might be able to go a step further and determine the ratio of triples that are invalid w.r.t. the OWL semantics. E.g. owl:Class owl:Class owl:Class . is a valid RDF triple, but doesn't make sense in OWL. Jena might be able to give us a list of such invalid triples for free. If Jena doesn't do it, maybe the OWL API does. (Not sure it supports streaming; let's find out.)

@nfriesen: Before we invest a lot of effort into using the OWL API, let's talk to the Repairing partners.

@muhammadaliqasmi: a note about the second step: If we manage to identify all individual triples that are not valid OWL, this also covers the job of MisusedOwlDatatypeOrObjectProperties, i.e. MisusedOwlDatatypeOrObjectProperties is a special case of "finding all triples that are not valid OWL", and thus we could refactor it to make it reuse some of the implementation of the ValidOWL metric, so that we only need to run the OWL parser once.

(Background: D3.1 Table 20 on page 90)