RDFLib / sparqlwrapper

A wrapper for a remote SPARQL endpoint
https://sparqlwrapper.readthedocs.io/
Other
520 stars 122 forks source link

Order of ResultRows for `SELECT *` #166

Open chiarcos opened 3 years ago

chiarcos commented 3 years ago

I might be missing something, but it seems that the order of ResultRows for a SELECT * query is randomized and changing. This is unexpected because other SPARQL engines I work with seem to generally apply the order of variables as they occur in the WHERE block.

Sample code:

query="SELECT * { ?a ?b ?c } LIMIT 10" qres=g.query(query) print(qres.vars)

Expected output: [rdflib.term.Variable('a'), rdflib.term.Variable('b'), rdflib.term.Variable('c')]

Real output: [rdflib.term.Variable('c'), rdflib.term.Variable('a'), rdflib.term.Variable('b')] (first run) [rdflib.term.Variable('a'), rdflib.term.Variable('b'), rdflib.term.Variable('c')] (second run) [rdflib.term.Variable('b'), rdflib.term.Variable('a'), rdflib.term.Variable('c')] (third run) [rdflib.term.Variable('a'), rdflib.term.Variable('c'), rdflib.term.Variable('b')] (fourth run) [you get the idea]

(Tested on the HDT edition of DBpedia 2016, created with g = rdflib.Graph(store=rdflib_hdt.HDTStore(rdf_file)), but that shouldn't matter.)

The application is that we run SPARQL queries whose number of variables isn't known in advance, that we return a binding for all variables and that the WHERE block (and the WHERE block only) is provided by the client. I could enforce a constant order by sorting keys (variables) lexicographically, but again, that order might be unexpected to the user as it changes depending on his naming preferences.

PritishWadhwa commented 2 years ago

resolved here