diachron / quality

Dataset Quality Assessment (part of WP5 of the Diachron EU FP7 project)
MIT License
8 stars 4 forks source link

UndefinedClassesOrProperties: really check whether the target is a class or property #43

Open clange opened 10 years ago

clange commented 10 years ago

The current implementation of UndefinedClassesOrProperties finds triples where a class or property is expected in the object position and then looks whether that “object resource” is accessible for the VocabularyReader. If a resource was found, it does not check whether the resource actually is a class or property. (Example below.)

So we need to check whether the data we found for that resource (usually: the data we downloaded from the object URI) contains something that convinces us that it is an rdfs:Class or an owl:Class, or an rdf:Property. (Note that if something is an owl:Class it is also an rdfs:Class, and that OWL defines a lot of special cases of rdf:Property, such as owl:ObjectProperty or owl:TransitiveProperty. I can write down the full list here once we are starting to implement this; please let me know.)

Let <o> be the URI of the object. From just looking at the data, without doing OWL reasoning, we can look for, e.g. <o> rdf:type owl:Class and will know that the triple <...> rdf:type <o> is a “good” triple w.r.t. this metric. We can even look for <o> ?p ?o and will know that …

Example: imagine a triple <...> rdf:type socialnetwork:Alice where socialnetwork:Alice rdf:type foaf:Person, i.e. socialnetwork:Alice is actually not an owl:Class but an owl:Individual (which is declared to be disjoint with owl:Class). This is a “bad triple” even if socialnetwork:Alice is defined.