diachron / quality

Dataset Quality Assessment (part of WP5 of the Diachron EU FP7 project)
MIT License
8 stars 4 forks source link

UndefinedClassesOrProperty metric implementation might not be entirely correct #31

Open clange opened 10 years ago

clange commented 10 years ago

I'm not following the literature here but rather just my own intuition. @jerdeb, could you please compare this metric's implementation with the literature, and post comments to this issue as appropriate?

The current implementation looks into the quad's subject, of which I'm not sure it's necessary, as when you reuse an ontology (and don't hijack namespaces, for which we have a separate metric) you usually don't redefine its classes/properties.

The current implementation also assumes that for a property to be defined the property must have a domain and a range. However in OWL ontologies it's common that properties are declared subproperties of other properties, or instances of "object property", or "transitive property", etc., and that's perfectly sufficient for a property to "be defined".

Also I think that checking whether the object is a defined class is only of interest when the predicate is rdf:type. If the predicate is, say, foaf:knows, the object could be anything, e.g. any other instance from our dataset, and we don't care. At least not for this metric.

If datasets do not only consist of instance data but also define some of their local vocabulary, we have a special case. In this case we might also inspect the objects of triples whose predicate is, e.g., rdfs:subClassOf, to see whether the object is a class defined in some ontology. @jerdeb we should discuss whether we want to support this case.

clange commented 10 years ago

BTW, @jerdeb, @nfriesen, does this issue (as well as #30) belong to a milestone, or is it something we should not prioritise before the July D3.2/D5.2 deadlines?

jerdeb commented 10 years ago

If we have it then it is good, but afaik it is not needed for the EBI use case.

On 23 June 2014 13:36, Christoph Lange notifications@github.com wrote:

BTW, @jerdeb https://github.com/jerdeb, @nfriesen https://github.com/nfriesen, does this issue (as well as #30 https://github.com/diachron/quality/issues/30) belong to a milestone, or is it something we should not prioritise before the July D3.2/D5.2 deadlines?

— Reply to this email directly or view it on GitHub https://github.com/diachron/quality/issues/31#issuecomment-46832943.

clange commented 10 years ago

@jerdeb: let's prioritise as follows: as we have most of the metric already, let Ali fix the remaining (small) problems. I will (hopefully today) clean up the comment above and give Ali some concrete instructions in #30.

clange commented 10 years ago

@jerdeb in #30 I gave the final “set” of instructions for “fixing this metric in its current state”. I have a few more ideas for further improving it, which I wrote down as a separate, low-priority issue in #43. However could you please once more review the original description of this issue? Just to make sure we have the right understanding of this metric.