W3C-HCLSIG / HCLSDatasetDescriptions

7 stars 13 forks source link

dataset statistics #99

Closed micheldumontier closed 9 years ago

micheldumontier commented 9 years ago

email from Norio KOBAYASHI

  1. Overall issues The definition of “dataset” described as the SPARQL queries in Section 6.6. seems to be a dataset and a named graph have one-to-one correspondence. I think this is because the definition of class void:Dataset is indistinct:

“dataset – A set of RDF triples that are published, maintained or aggregated by a single provider.”

As SPARQL 1.1 Service Description which allows to associate a dataset with zero or more named graphs, the SPARQL queries with FROM clauses should be write as

SELECT … FROM * {…}

SELECT COUNT(DISTINCT ?o) AS ?distnctClasses FROM * { ?s a ?o } (In the draft, COUNT(...) is missing.)

In the draft, the corresponding RDF is described using properties void;classPartition and void;distinctSubjects as follows:

:rdfdataset void:classPartition [ void:class rdfs:Class ; void:distinctSubjects "###"^^xsd:integer ] .

In consideration of the definitions of void:

class partition – A subset of a void:Dataset that contains only the entities of a certain rdfs:Class, and

distinct subjects – The total number of distinct subjects in a void:Dataset. In other words, the number of distinct resources that occur in the subject position of triples in the dataset, the integer as the object of void:distinctSubject in the RDF description is not the number of classes.

Therefore, the RDF description is not corresponds to the semantics of the title. To solve this, we can define novel property, for instance void-ext:explicitlyTypedEntitie:rdfdataset, the RDF can be easily described as follows:

… void:classPartition [ void:class rdfs:Class ; void-ext:explicitlyTypedEntities "###"^^xsd:integer ] .

In case (2), the SPARQL query have to be extended to count unique classes including inferred classes as rdfs:Class in the RDFs framework. However, since property void;classes is already defined, the RDF description can be written as

:rdfdataset void:classes “###”^^xsd:integer

6.6.1.7 and 6.6.1.8 As well as 6.6.1.6, the RDF description does not correctly describe the numbers of unique literals and graphs in 6.6.1.7 and 6.6.1.8 respectively. Similar to our proposal in 6.6.1.6, we can introduce novel properties, for instance void-ext:literals and void-ext:graphs for 6.6.1.7 and 6.6.1.8 respectively since the semantics of these sections cannot be written using existing the void vocabulary.

6.6.2.1 Which does the definition mean: (1) to specify the classes used to type entities and their entities in the dataset, or (2) to specify the classes and the number of their entities in the dataset ?

In case (1), the SPARQL query is perfect but the RDF description does not provide correct semantics since it does not include inferred entity-class relationships. If we define novel property void-ext:explicitlyTypedEntities, then the RDF description can be written as follows;

:rdfdataset void:classPartition [ void:class ; void-ext:explicitlyTypedEntities "###"^^xsd:integer ] .

In case (2), though the SPARQL query have to be extended to infer rdfs:Class, the RDF description can be simply written as

:rdfdataset void:classPartition [ void:class ; void:entities "###"^^xsd:integer ] .

Further, “ORDER BY” in the SPARQL query should be removed if ordering is not necessary in the semantics.

6.6.2.3 Which does the title mean: (1) to specify the property, the number of triples having the property, and the number of unique, typed entities of each class included in the triples”, or (2) to specify the property, the number of triples having the property, and the number of unique, explicitly typed entities of each classes included in the triples ?

In case (1), though the SPARQL query have to be extended to infer rdfs:Class, the RDF description can be written as

:rdfdataset void:propertyPartition [ void:property ; void:triples "###"^^xsd:integer ; void:classPartition [ void:class ; void:entities "###"^^xsd:integer ; ]].

In case (2), the SPARQL query is correct, but RDF description cannot be easily written. If we introduce a novel property, void-ext:explicitlyTypedEntities, then the RDF description can be written as

:rdfdataset void:propertyPartition [ void:property ; void:triples "###"^^xsd:integer ; void:classPartition [ void:class ; void-ext:explicitlyTypedEntities "###"^^xsd:integer ; ]].

As another solution, if we newly define property void-ext:subjectClassPartition as well as existing property void-ext:objectClassPartition, the RDF description can be written as

:rdfdataset void:propertyPartition [ void:property ; void:triples "###"^^xsd:integer ; void:subjectClassPartition [ void:class ; void:entities "###"^^xsd:integer ; ]].

In consideration 6.6.2.6 as below, I prefer the latter solution that introduce property void-ext:subjectClassPartition.

6.6.2.4 Does the title mean “To specify the numbers of unique typed objects linked by a property and triples having the objects in the dataset” ? The SPARQL query should be written as follows:

SELECT ?p (COUNT(?p) AS ?triples) ?otype (COUNT(DISTINCT ?o) AS ?ocount) FROM * { ?s ?p ?o . ?o a ?otype . FILTER (!isLiteral(?o)) } GROUP BY ?p ?otype

6.6.2.5 Does the title mean “to specify the numbers of unique literals related to a property and triples having the literals in the dataset” ?

6.6.2.6 Does the title mean “to specify the number of unique typed subjects that are linked to unique typed objects for each subject-object class pair in the dataset” ? If so, the RDF description should be as follows; :rdfdataset void:propertyPartition [ void:property ; void-ext:subjectClassPartition [ void:class ; void:objectClassPartition[ void:class ; void:distinctSubjects "###"^^xsd:integer ; void:distinctObjects "###"^^xsd:integer ; ]; ]; ] .

micheldumontier commented 9 years ago

email invite to Norio to join us next week.