Closed VladimirAlexiev closed 9 years ago
Cannot comment on propertyPartition, classPartition and distinctSubjects, but use of LinkSet and linkPredicate appear to be wrong.
From http://www.w3.org/TR/void/#linkset:
VoID also allows the description of RDF links between datasets. An RDF link is an RDF triple whose subject and object are described in different datasets.
and
The property void:linkPredicate can be used to specify the type of links that connect two datasets. In other words, it names the RDF property in the predicate position of the link triples.
The following example uses void:linkPredicate to state that the DBpedia and Geonames datasets are linked by triples that have the owl:sameAs predicate:
disagree. a void:Dataset is a set of RDF triples. a void:Linkset is a collection of RDF triples between two datasets. Therefore, we can create Linksets between any arbitrary datasets.
Yes, but the section I'm quoting doesn't talk about 2 datasets. It appears to want to provide some stats of 1 dataset, and uses wrong class and property. See http://www.w3.org/TR/void/#class-property-partitions (as opposed to http://www.w3.org/TR/void/#describing-linksets)
A dataset is any set of triples. in the formulation for the enhanced statistics, we describe a set of relations (i.e. linkset) between arbitrary partitions of a dataset. each partition is a dataset in its own right (see void:subset). i think this approach is justifiable, and falls within the scope of VoID constructs provided. You seem not to agree - could you provide an alternative formulation?
we describe a set of relations (i.e. linkset) between arbitrary partitions of a dataset.
Not true. Eg section properties and the number of unique objects linked to the property shows this query:
SELECT ?p (COUNT(DISTINCT ?o ) AS ?count ) { ?s ?p ?o } GROUP BY ?p
Where do you see 2 arbitrary (i.e. independent) partitions here?
The right way to express this is (see http://www.w3.org/TR/void/#statistics):
:rdfdataset
void:propertyPartition [
void:property <property-uri> ;
void:distinctObjects "###"^^xsd:integer] .
This counts any objects (URIs, blank nodes, literals), as per the above query and the VOID spec. If you want to count only resources, see http://www.w3.org/TR/void/#class-property-partitions and use rdfs:Resource (not rdfs:Class):
:rdfdataset
void:propertyPartition [void:property <property-uri> ;
void:classPartition [void:class rdfs:Resource;
void:distinctObjects "###"^^xsd:integer]].
The key to understanding the above is that both void:propertyPartition and void:classPartition create sub-datasets, which are sets of triples. So it's legitimate to speak of the void:distinctObjects of those triples.
We need to specify 1 - the property 2 - the subject class partition 3 - the object class partition
so the reason we started using the linkset was because of "void:subjectsTarget" and "void:objectsTarget" to specify both the subject and target class partitions. Can you elaborate on how we can get this kind of functionality using a void:propertyPartition?
Dear Michel,
I cannot see any query in the quoted section that reports on property and two classes. The closest query that I see is: unique subject types that are linked through a property to unique object types:
SELECT (COUNT(DISTINCT ?s ) AS ?scount ) ?p (COUNT(DISTINCT ?o ) AS ?ocount ) { ?s ?p ?o } GROUP BY ?p
It counts distinct subjects and objects per property. This can be reported as follows:
:rdfdataset
void:propertyPartition [
void:property <property-uri> ;
void:distinctSubjects "###"^^xsd:integer] .
void:distinctObjects "###"^^xsd:integer] .
However, the same query seems to want to (incorrectly) report on property and two classes:
:rdfdataset
void:subset [
a void:LinkSet ;
void:linkPredicate <property-uri> ;
void:subjectsTarget [
void:class <subject-type-uri> ;
void:entities "###"^^xsd:integer ;
void:objectsTarget [
void:class <object-type-uri> ;
void:entities "###"^^xsd:integer]]].
To make such a report, you need to use the http://ldf.fi/void-ext ontology (see here for a tool implementing such counts: http://jiemakel.github.io/aether/, and a paper explaning it), eg like this:
:rdfdataset
void:propertyPartition [void:property <property-uri> ;
void:classPartition [void:class <subject-class-uri>;
void-ext:objectClassPartition [void:class <object-class-uri>;
void:triples "###"^^xsd:integer]]].
Above we use:
Note that if you have some subclass or subproperty inference in the repository, those partitions won't be exclusive...
so the objectClassPartition is a property of the classPartition? and the void:triples are associated with the objectClassPartition? strange.
void-ext:objectClassPartition is analogous to void:classPartition: they make a subset (both are subprops of void:subset). The difference is that objectClassPartition restricts the Objects of triples in the subset, whereas classPartition restricts the Subjects.
This needs to be qualified: http://www.w3.org/TR/void/#class-property-partitions says "The (classPartition) contains all triples that describe entities that have this class as their rdf:type". Is it true that the word "describe" means "have as subject"? SPARQL deliberately leaves freedom about how a "DESCRIBE ?s" query is implemented. Most repos return Concise Bounded Description (CBD), which includes all "?s ?p ?o" triples, but also all triples "?s ?p1 ?blank. ?blank ?p2 ?o" where ?blank is a blank node (recursively); and "?statement rdf:subject ?s. ?statement ?p ?o" (i.e. all reified statements about ?s). Others even return Symmetric CBD, which includes statements where ?s is Object.
objectClassPartition is a property of the classPartition?
No: objectClassPartition can be applied against and void:Dataset, no matter whether it's the result of a partition or not. The subsets being void:Dataset, you can subdivide them further. You can swap the order/nesting of the propertyPartition, classPartition, objectClassPartition and still get almost the same results. At each level, you need to describe the parameter of partition: void:property and void:class (twice).
By "almost" I refer to the ambiguity of "describe" above. You also need to be careful about literals: if your repo does not automagically declare all literals to be of class rdf:Literal, then objectClassPartition will skip all data triples (having a literal as their object). And "declare literals as rdf:Literal" means eg "123 a rdf:Literal" which is weird, because in RDF 1.0 literals cannot be the subject of a statement (maybe RDF 1.1 allows that)
Hi, ok, i modified the relevant structures - see the diff here : https://github.com/joejimbo/HCLSDatasetDescriptions/compare/statistics
how does that look?
@VladimirAlexiev can you have a look at the diff?
void:entities "###"^^xsd:integer ;
]
].
Use that:
void:entities "###"^^xsd:integer ]].
Cheers!
@VladimirAlexiev ok, i have made the edits. can you verify the correctness for each statistic?
Thanks for adding me to the contributors! Could you please change it to this:
<dd>Vladimir Alexiev, Ontotext Corp, Bulgaria <<a href="mailto:vladimir.alexiev@ontotext.com">vladimir.alexiev@ontotext.com</a>></dd>
done.
Please ensure that the examples both within the document and hcls.ttl are updated. (Relates to issue #89)
I'll look at the IO Informatics use case and will harmonize it in accordance with the guidelines
@egombocz I think your comment relates to issue #74
I sent a note to Vladimir asking him to verify what Michel did (followup to https://github.com/joejimbo/HCLSDatasetDescriptions/issues/81#issuecomment-61188841).
@VladimirAlexiev can you have another look at the latest?
refactored statistics have now been merged as per commit e85578a9da34c2022b971141dcdb386437d3d7a4
Sec 6.6.2 uses LinkSet to provide
This is totally wrong: void:LinkSet and void:linkPredicate are used to describe links between datasets, not counts within one dataset. You should use void:propertyPartition (and maybe void:classPartition within it) and void:distinctSubjects.