diachron / quality

Dataset Quality Assessment (part of WP5 of the Diachron EU FP7 project)
MIT License
8 stars 4 forks source link

(Metric Impl) Ontology is a member of the OBO Foundry #20

Closed jerdeb closed 10 years ago

jerdeb commented 10 years ago

This is a Reputation Dimension.

In this metric we need to check if the used ontologies (therefore we need to check only for the type of an instance) are part of the OBO Foundry

For more information check D5.1

In the class comment, mention that this metric is specific to the EBI use-case

clange commented 10 years ago

@muhammadaliqasmi I discussed this metric with @nfriesen and will give you a more detailed guide for its implementation below. Please stop reading at the horizontal line. @jerdeb below that line there are some further questions that still need clarification (by you looking into the literature or even talking to the use case partners).

This metric is actually very similar to UndefinedClassesOrProperties. We need to compute the ratio of resources referenced in our dataset that are defined in ontologies that are members of the OBO Foundry.

@jerdeb's comment above that “we need to check only for the the type of an instance” (i.e. for the objects of triples whose predicate is rdf:type) is wrong because there are OBO Foundry ontologies (e.g. http://svn.code.sf.net/p/obi/code/releases/2014-03-29/obi.owl) that also define properties and individuals, which means that we need to do the check described below for all predicates and objects of all triples in our dataset. (By the reasoning of #31 I think we can skip the subjects.)

Therefore, @muhammadaliqasmi, I think we can do the following:

  1. check whether the URIs start with a string that's, for now, in a hard-coded list, which for now contains http://purl.obolibrary.org/obo/ as the only element.
  2. check, using VocabularyReader, and similarly to UndefinedClassesOrProperties metric, whether the class/property with this URI is defined.

From this we know that

E.g., for http://purl.obolibrary.org/obo/OBI_9991118, this is the case.


@jerdeb open questions start here:

Should this metric return "true or false", or a ratio (i.e. how of the classes/properties (also individuals?) used in a dataset are from ontologies in the OBO Foundry). "ratio" makes more sense IMHO, as the question with "true or false" is when we should return true: if all classes/properties/individuals that our dataset reuses are from OBO Foundry ontologies, or if some of them are?

Secondly, is the above “OBO Foundry membership check” actually right? Dereferencing http://purl.obolibrary.org/obo/OBI_9991118 takes me to http://www.ontobee.org/browser/rdf.php?o=OBI&iri=http://purl.obolibrary.org/obo/OBI_9991118 (and it also gives some meaningful RDF/XML to a linked data client). So I think our job is to see whether the stuff we can download by dereferencing that URI has a certain structure that looks like “OBO Foundry”.

I don't know whether this mechanism

muhammadaliqasmi commented 10 years ago

This metric detects non reputable resources by retrieving URI of resources from data sets and prefix match with "http://purl.obolibrary.org/obo/".

Metric value = (total number of NOT reputable resources ) / ( total number of resources )

Metric Value Range : [0 - 1] , Best Case : 0, Worst Case : 1

(for further changes/improvements, kindly re-open this issue).

.