blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
872 stars 170 forks source link

Faulty response to certain queries combining DISTINCT and VALUES keywords #250

Open simonrihm opened 2 days ago

simonrihm commented 2 days ago

There seems to be a bug when parsing queries of the form

SELECT DISTINCT ?object WHERE { VALUES ?subject { <s1> <s2> } ?subject <predicate> ?object .

Instead of returning all distinct objects related to the two given subject instances by the predicate, it returns all instances of the subjects type. This only happens when the following conditions are met: a) Only the object is selected with DISTINCT keyword. b) More than one instances are bound to the subject variable via VALUES keyword. c) Only one s-p-o statement is given as context.

Workarounds I have found so far: 1) Do not use DISTINCT keyword, instead add GROUP BY ?object at the end of the query 2) Add a random second statement to the query that does not affect the results, e.g. ?x ?y ?z .

You can reproduce this bug for example in Wikidata:

thompsonbry commented 2 days ago

Per Michael, This is likely to be a bug in the ASTStaticBindingsOptimizer.

Bryan

On Fri, Jun 28, 2024 at 03:58 simonrihm @.***> wrote:

There seems to be a bug when parsing queries of the form

SELECT DISTINCT ?object WHERE { VALUES ?subject { } ?subject

?object . Instead of returning all distinct objects related to the two given subject instances by the predicate, it returns all instances of the subjects type. This only happens when the following *conditions* are met: a) Only the object is selected with DISTINCT keyword. b) More than one instances are bound to the subject variable via VALUES keyword. c) Only one s-p-o statement is given as context. *Workarounds* I have found so far: 1. Do not use DISTINCT keyword, instead add GROUP BY ?object at the end of the query 2. Add a random second statement to the query that does not affect the results, e.g. ?x ?y ?z . You can *reproduce* this bug for example in Wikidata: - Get fathers of Johann Sebstian Bach, Johann Jacob Bach, and Wolfgang Amadeus Mozart: Works correctly as no DISTINCT keyword is used but returns Johann Ambrosius Bach twice as he is the father of both Sebastian and Jacob . https://query.wikidata.org/#SELECT%20%3Ffather%0AWHERE%0A%7B%0A%20%20%3Fchild%20wdt%3AP22%20%3Ffather%20.%0A%20%20VALUES%20%3Fchild%20%7B%0A%20%20%20%20wd%3AQ1339%0A%20%20%20%20wd%3AQ539427%0A%20%20%20%20wd%3AQ254%0A%20%20%20%20%7D%0A%7D%0ALIMIT%2050 - Bug appears when DISTINCT keyword is added (LIIMIT therefore needed, otherwise this tries to return every child on Wikidata): https://query.wikidata.org/#SELECT%20DISTINCT%20%3Ffather%0AWHERE%0A%7B%0A%20%20%3Fchild%20wdt%3AP22%20%3Ffather%20.%0A%20%20VALUES%20%3Fchild%20%7B%0A%20%20%20%20wd%3AQ1339%0A%20%20%20%20wd%3AQ539427%0A%20%20%20%20wd%3AQ254%0A%20%20%20%20%7D%0A%7D%0ALIMIT%2050 - Use of Workaround 1: https://query.wikidata.org/#SELECT%20%3Ffather%0AWHERE%0A%7B%0A%20%20%3Fchild%20wdt%3AP22%20%3Ffather%20.%0A%20%20VALUES%20%3Fchild%20%7B%0A%20%20%20%20wd%3AQ1339%0A%20%20%20%20wd%3AQ539427%0A%20%20%20%20wd%3AQ254%0A%20%20%20%20%7D%0A%7D%0AGROUP%20BY%20%3Ffather%0ALIMIT%2050 - Use of Workaround 2: https://query.wikidata.org/#SELECT%20DISTINCT%20%3Ffather%0AWHERE%0A%7B%0A%20%20%3Fchild%20wdt%3AP22%20%3Ffather%20%3B%0A%20%20%20%20%20%20%20%20%20%3Fy%20%3Fz%20.%0A%20%20VALUES%20%3Fchild%20%7B%0A%20%20%20%20wd%3AQ1339%0A%20%20%20%20wd%3AQ539427%0A%20%20%20%20wd%3AQ254%0A%20%20%20%20%7D%0A%7D%0ALIMIT%2050 — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>