Open yayamamo opened 9 years ago
The intent of 6.6.2.6 is to capture the total number of triples between subjects and objects of a specified type e.g. 100 distinct subjects may be connected to 10 distinct objects via 100 triples. One way of dealing with the total number of triples between subjects and objects of a certain type would be to simply declare a property partition on "rdfs:property".
I cannot fully understand what the meaning of to declare a property partition on "rdfs:property". My previous comment may be vague, and a statistic what I'd like to know is the number of a certain predicate that connects specific classes (i.e., :c1 and :c2 in the example below). If the predicate connects these classes only, the number is identical to that of the predicate.
SELECT ?p (COUNT(?p) AS ?rc)
WHERE {
GRAPH :graph {
?s ?p ?o .
?s a :c1 .
?o a :c2 .
}}
GROUP BY ?p
6.6.2.6 does just this, does it not?
I don't think so. The difference is what I wrote at the top of this comment. Former is 6.6.2.6, and the latter is the query I wrote just above.
count(distinct ?s) = 100, count(distinct ?o) = 1, count(?p) = 100
One extreme example is that 100 different subjects have an identical property. The former says that 100 distinctSubjects and 1 distinctObject(s) while the latter says 100 triples.
count(distinct ?s) = 10, count(distinct ?o) = 10, count(?p) = 100
Another example is that each of 10 different subjects has an identical set of 10 properties. The former says that 10 distinctSubjects and 10 distinctObjects while the latter says 100 triples.
so 6.6.2.2 talks about properties and number of triples. This query is not, however, limited to the subject and object being of some arbitrary type - we imagine that this is necessarily true.
SELECT ?p (COUNT(?p) AS ?triples) { ?s ?p ?o } GROUP BY ?p
That is to say, would 6.6.2.6 be as follows?
:rdfdataset
void:propertyPartition [
void:property <property-uri> ;
void:triples "###"^^xsd:integer ;
void:classPartition [
void:class <subject-class-uri> ;
void:distinctSubjects "###"^^xsd:integer ;
];
void-ext:objectClassPartition [
void:class <object-class-uri> ;
void:distinctObjects "###"^^xsd:integer ;
];
] .
SELECT (COUNT(DISTINCT ?s) AS ?scount) ?stype ?p (COUNT(?p) AS ?pcount) ?otype (COUNT(DISTINCT ?o) AS ?ocount)
{
?s ?p ?o .
?s a ?stype .
?o a ?otype .
} GROUP BY ?p ?stype ?otype
yes that's right
Hi, The spec of 6.6.2.6 defines the unique numbers of subjects and objects w.r.t a predicate. This shows one aspect of the triples connecting two classes, but another cannot be obtained. It is the unique number of triples connecting the two classes. More properly, it specifies the number of unique triples that connects typed subjects and objects, which belong to certain classes, respectively.
One extreme example is that 100 different subjects have an identical property. The former says that 100 distinctSubjects and 1 distinctObject(s) while the latter says 100 triples. Another example is that each of 10 different subjects has an identical set of 10 properties. The former says that 10 distinctSubjects and 10 distinctObjects while the latter says 100 triples.
I think the latter statistics is also useful to know the characteristics of the target dataset, and I feel this was on the document before, wasn't it?