ffdev-info / wikidp-issues

An issues repository for resolving issues in Wikidata around the records relating to Digital Preservation
GNU General Public License v3.0
1 stars 0 forks source link

Understand asterisk usage in Wikidata query "?uri wdt:P31/wdt:P279* wd:Q235557." #24

Closed ross-spencer closed 1 year ago

ross-spencer commented 3 years ago

Description of problem

Currently we use ?uri wdt:P31/wdt:P279* wd:Q235557. to return instance-of/subclass-of file format. We can return more results and thus signatures through ?uri wdt:P31*/wdt:P279* wd:Q235557. (note the asterisk after wdt:P31.

Without asterisk:

{
  "AllSparqlResults": 14911,
  "CondensedSparqlResults": 13052,
  "SparqlRowsWithSigs": 9962,
  "RecordsWithPotentialSignatures": 9249,
  "FormatsWithBadHeuristics": 20,
  "RecordsWithSignatures": 9229,
  "MultipleSequences": 11,
  "AllLintingMessages": [
    "Use the `-wikidataDebug` flag to build the identifier to see linting messages"
  ],
  "AllLintingMessageCount": 200,
  "RecordCountWithLintingMessages": 151
}

With JSON:

{
  "AllSparqlResults": 15275,
  "CondensedSparqlResults": 13350,
  "SparqlRowsWithSigs": 10031,
  "RecordsWithPotentialSignatures": 9274,
  "FormatsWithBadHeuristics": 20,
  "RecordsWithSignatures": 9254,
  "MultipleSequences": 11,
  "AllLintingMessages": [
    "Use the `-wikidataDebug` flag to build the identifier to see linting messages"
  ],
  "AllLintingMessageCount": 219,
  "RecordCountWithLintingMessages": 168
}

Example formats we retrieve with the asterisk are:

BertrandCaron commented 1 year ago

The asterisk stands for "0 to n" P279 properties. The query will return ?uri wdt:P31 wd:Q235557. ?uri wdt:P31/wdt:P279 wd:Q235557. ?uri wdt:P31/wdt:P279/wdt:P279 wd:Q235557. ?uri wdt:P31/wdt:P279/wdt:P279/wdt:P279 wd:Q235557. etc.

Though P279 is declared a transitive property, inference does not seem to work as you have different results with or without the asterisk...

ross-spencer commented 1 year ago

Oh interesting. Thanks Bertrand.

It seems that there's a good example on Wikibooks using Bach:

With a +: https://w.wiki/66uj With a *: https://w.wiki/66um Without: https://w.wiki/66up

One example difference without * is Johann August Bach: https://www.wikidata.org/wiki/Q66587214 who is JSB's grand-child and child of Carl Philipp Emanuel Bach. Without the * or + we just get JSB's direct descendants (children}.

https://en.wikibooks.org/w/index.php?title=SPARQL/Property_paths&oldid=3476967

TIL!! :D

BertrandCaron commented 1 year ago

Yes, but P279 is transitive (which means if A a subclass of B, and B is subclass of C, a reasoner can deduce that A is also a subclass of C). But it seems that this mechanism (inferring new relationships from data and the ontology) is not used in Wikidata.

In your example, P40 is not transitive, though a hypothetical property "descendant of" should be.

BertrandCaron commented 1 year ago

There is a project about inference. It suggests that rules of reasoning are not clearly specified (in particular, what should be done with qualifiers?).

BertrandCaron commented 1 year ago

And a use case for P279.