Open pfps opened 7 years ago
Real ones keep it real and have full access to remove all spiders and bugs in the current system
I have the same problem and no one has been able to help me with it. Any advice would be much appreciated.
Hi! Can I work on this issue? Seems interesting to me. Just waiting for the green light from the maintainers and contributors.
Sure you are welcome. We are happy about any pull-request helping us to fix bugs.
Hi everyone!
I believe I was able to find the crux of the issue. So, I tried to modify the query where I removed the "DISTINCT" keyword from the query and selected the ?value
along with ?this
.
foo = run(""" SELECT ?this ?value WHERE { ?this rdf:type <http://ex.com/c1> . OPTIONAL { ?this <http://ex.com/rt> ?value . } } GROUP BY ?this HAVING ( COUNT (?value) < 3 ) """)
This produces the following result consisting of two tuples in the form of (?this, ?value):
>>>>> (rdflib.term.URIRef('http://ex.com/i1'),rdflib.term.URIRef('http://ex.com/i4'))
>>>>> (rdflib.term.BNode('f79fa7e5d2efe481a9a0e68ee8996a731b1'), None)
If we look at the second resultant tuple ?value
seems to None
in this case.
Interesting thing is that DISTINCT is not defined over None
values in the RdfLib code.
For example, if you have a list of ?values
such as ['x1','x2', . . . , None, None, None]
,
here, "DISTINCT" feature won't be able to take a distinct None
value from the list. Hence causing the error in your case.
However, handling DISTINCT over None
values is not defined yet at w3c standards for SPARQL.
(Interestingly, handling null values over DISTINCT in MYSQL is fairly common!)
I don't see how None should be handled at all by SPARQL, as it is not part of the RDF data model or the SPARQL model. One major difference between RDF and relational data models is that RDF has no null values, removing a major problem that affects SQL. (Of course, blank nodes are a problem for SPARQL.)
If None is showing up in intermediate results in rdflib then this is an artifact of the implementation, probably signalling that a query variable has no value, and have to be handled specially.
@pfps thanks for the tip. It looks like there are tree bindings in the example where ?value is not bound as it is selected in an optional query part. These non bindings should be removed before the engine gets to the distinct operator.
When I run the attached query (in the example) I get a strange error from inside rdflib query. This error goes away if the DISTINCT is removed.
example.py
:test-data.ttl
Output:
(edited to include example code)