DaniFdezAlvarez / shexer

Apache License 2.0
31 stars 4 forks source link

Adding statistics into ShEx annotation #133

Open yayamamo opened 1 year ago

yayamamo commented 1 year ago

There is a ShEx specification . It would be better to explain statistics as ShEx annotations since it can be machine readable. They can be expressed as follows:

ex:IssueShape {
  ex:status .
    // rdf:value "0.77"^^sio:SIO_001018 # http://semanticscience.org/resource/SIO_001018 (ratio)
    // rdf:value "800"^^sio:SIO_000794 # http://semanticscience.org/resource/SIO_000794 (count)
}
DaniFdezAlvarez commented 1 year ago

@yayamamo , I'd like to work on this, but I think I need more examples to generate a proper output.

On the one hand, what should we do with triple constraints whose cardinality range includes zero (such as * or ?) ? This is an example of a current output:

:Person # 3 instances { rdf:type [foaf:Person] ; # 3 instances. foaf:name xsd:string ?;

2 instances. obj: xsd:string. Cardinality: {1}

}

I understand that rdf:type [foaf:Person] ; should be annotated with rdf:value "1"^^sio:SIO_001018 (ratio) and rdf:value "3"^^sio:SIO_000794 (count)

But cases such as foaf:name xsd:string ?; bug me out. It seems that it would be useful to annotate this constraint with a count of 2 and a ratio of 0.666. But that information is not really related to the triple constraint "foaf:name xsd:string ?;", but to "foaf:name xsd:string {1};", with a cardinality of exactly one. Once you put there a "?" cardinality, then every instance complies with it, so the actual ratio of the constraints would be 1 and the actual count would be 3. I think that annotation would be semantically correct, but unuseful, as absolutely every extracted constraint will then have always ratio 1 and the same instance count.

Also, there is another issue. I've been discusing a simple case, but let's say we get a shape such as the following one:

:Person # 3 instances { rdf:type [foaf:Person] ; # 3 instances. foaf:name xsd:string *;

2 instances. obj: xsd:string. Cardinality: {+}

        # 1 instance. obj: xsd:string. Cardinality: {2}
        # 1 instance. obj: xsd:string. Cardinality: {1}

}

For "foaf:name xsd:string ;", if we limit the annotations to a single case (could be cardinality or +), then there is no (extra) problem here. But if we try to produce annotations for more specific cardinalities (in this case, ratio(cont of instances having exactly one or exactly 2 cardinality), I wouldn't know how to organize the informacion.

What's your opinion on these matters?