Closed AlasdairGray closed 9 years ago
I see your point. I think it's ok to have it in the table, even though it is an inverse relation from a data item to a distribution uri. the guidance note should spell that out. yes, the example needs to be fixed to point to an actual distribution uri
Suggested actions
Another related issue is that void:inDataset points from RDF document URI to a void:Dataset URI . I had always assumed that it was between a resource URI and a dataset URI. So we may have to find another relation between a data item and a dataset.
+1 to Michel, void:inDataset is used to indicate that the triples serialised in an RDF document belong to the dataset (see http://www.w3.org/TR/void/). so, for the provenance of a data item maybe we can use dct:isPartOf rather than void:inDataset?
The void:inDataset
example is a bit of a gotcha in the VoID documentation. I'd completely missed the fact that it was the DBPedia data namespace rather than the resource namespace. This does alleviate one of the concerns that @JervenBolleman had around the number of additional triples he'd need to add to the UniProt data. However it does not meet the use case requirement of linking a triple back to its dataset.
@agbeltran unfortunately dct:isPartOf
is used for a different interpretation, particularly in VoID datasets (see note in Section 6.3 of the VoID Note).
I'm not aware of a similar property to void:inDataset
.
Alasdair and I believe that while using dc:isPartOf is very natural for a relation between an assertion (reified triple/RDF Statement or assertion graph in nanopublication/ovopub), it is a bit more challenging with respect to the relation between an subject, predicate, or object of a triple to a dataset. Here are some options we thought of
dc:isPartOf : http://purl.org/dc/terms/isPartOf sio:refers-to : http://semanticscience.org/resource/refers-to
or some new relation
which could easily be added to SIO, or some other vocabulary.
thoughts?
If you want to relate a triple to a dataset one needs to reify the triple (or have another way of identifying a triple. e.g. single member named graphs). To say a triple is in a dataset is in the proper scope of PROV-O. To make this part of the data explodes the dataset or description size for no real world gain.
e.g.
uniprot:P05067 a up:Protein ;
[] rdf:subject uniprot:P05067;
rdf:predicate a;
rdf:object up:Protein ;
void:inDataset uniprotkb:release2014_11,uniprotkb:release2014_10,uniprotkb:release2014_09, ... swiss-prot:10.2
To say a resource is described/talked about in a dataset is in the scope of void/hcls dataset descriptions. But it should be an optional thing, as in the correct way it is a listing of all unique IRI's in a dataset. For UniProt that is about 2 billion values, similar for ChEMBL etc.. The listing of all unique IRI's in a dataset is interesting but not of a high value to our users.
In the end I am doubtful that there is a solid well thought out usecase for void:inDataset or similar constructs that are not much better solved by PROV-O use in the original dataset (or having a dataset consisting of nanopublications)
@JervenBolleman what is the PROV-O relation to say a triple is in a dataset?
This relation between a component of a triple and a dataset is already optional - it is not necessary that you see it as a vital use case, although others, including myself, see it as such. what we are doing is determining which relation should be used to express this.
void:inDataset has foaf:Document as domain. So you would be inferring the RDF statement as a document.
@micheldumontier I am too tired, reading SHOULD as must. SHOULD is ok, although I would like OPTIONAL/MAY better.
PROV-O example to my understanding
uniprot:P05067 a up:Protein ;
uniprotkb:release2014_07 prov:hadMember [ rdf:subject uniprot:P05067;
rdf:predicate a;
rdf:object up:Protein ;
a prov:Entity, rdf:Statement]
But I would not be surprised if I am wrong in this interpretation.
I don't think we want to add so many triples. I'm going to propose the addition of a new set of relations (object properties) to SIO:
has-data-item / is-data-item-in has data item is a relation between a dataset and any described/referenced entity. 'is data item in' is a relation between an entity that is described or referenced in a dataset.
Would it be possible to add usage guidelines to the descriptions of the new terms, e.g., when to use SIO:is-data-item-in
and when to use void:inDataset
?
How should the
void:inDataset
property be used?Currently we have it in the table as a SHOULD property for distribution level descriptions but this is misleading as the description should be pointed to from the data. How should we show this in the table?
The example in the guidance notes is wrong as it should be a specific resource in the dataset that is linked back to the description. The explanation should be extended to adequately explain the correct usage of the property.
This is not something that the validator can check.