Closed WolfgangFahl closed 5 months ago
Maybe this issue is because there result of a SPARQL Update query (in SPARQLWrapper) is not a result that can be converted/parsed, due to the the nature of the specification of the SPARQL protocol.
For SPARQL Query (SELECT, ASK, DESCRIBE, CONSTRUCT),
The response body of a successful query operation with a 2XX response is either:
a SPARQL Results Document in XML, JSON, or CSV/TSV format (for SPARQL Query forms SELECT and ASK); or, an RDF graph [RDF-CONCEPTS] serialized, for example, in the RDF/XML syntax [RDF-XML], or an equivalent RDF graph serialization, for SPARQL Query forms DESCRIBE and CONSTRUCT). https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-success
And in the case of SPARQL Update,
The response body of a successful update request is implementation defined. Implementations may use HTTP content negotiation to provide both human-readable and machine-processable information about the completed update request. https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#update-success
So you will need to distinguish the post-process of a SPARQL query request and a SPARQL update request.
You can see an example of SPARQL Update request in the documentation. https://sparqlwrapper.readthedocs.io/en/stable/main.html#sparql-update-example
see also http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest - i am sending an e-mail to the apache users list now to get more info
text/html;charset=utf-8
not a Fuseki response from the SPARQL engine. Was the endpoint amn HTML page?
@afs thank you for looking into this. I followed the procedure in http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest#Apache_Jena
scripts/jena -l sampledata/example.ttl
scripts/jena -f example
which will install apache jena - load the sample data and fire up the fuseki server.
apache-jena-fuseki-3.16.0
apache-jena-3.16.0
start loading sampledata/example.ttl to /Users/wf/Documents/py-workspace/DgraphAndWeaviateTest/data at 2020-08-15T14:41:24Z
finished loading sampledata/example.ttl to /Users/wf/Documents/py-workspace/DgraphAndWeaviateTest/data at 2020-08-15T14:41:26Z
16:41:26 INFO loader :: Loader = LoaderPhased
16:41:26 INFO loader :: Start: sampledata/example.ttl
16:41:26 INFO loader :: Finished: sampledata/example.ttl: 20 tuples in 0.07s (Avg: 298)
16:41:26 INFO loader :: Finish - index SPO
16:41:26 INFO loader :: Start replay index SPO
16:41:26 INFO loader :: Index set: SPO => SPO->POS, SPO->OSP
16:41:26 INFO loader :: Index set: SPO => SPO->POS, SPO->OSP [20 items, 0.0 seconds]
16:41:26 INFO loader :: Finish - index OSP
16:41:26 INFO loader :: Finish - index POS
scripts/jena -f example
apache-jena-fuseki-3.16.0
apache-jena-3.16.0
starting fuseki server
16:41:52 INFO Server :: Running in read-only mode for /example
16:41:52 INFO Server :: Apache Jena Fuseki 3.16.0
16:41:52 INFO Config :: FUSEKI_HOME=/Users/wf/Documents/pyworkspace/DgraphAndWeaviateTest/lib/apache-jena-fuseki-3.16.0/.
16:41:52 INFO Config :: FUSEKI_BASE=/Users/wf/Documents/pyworkspace/DgraphAndWeaviateTest/lib/apache-jena-fuseki-3.16.0/run
16:41:52 INFO Config :: Shiro file: file:///Users/wf/Documents/pyworkspace/DgraphAndWeaviateTest/lib/apache-jena-fuseki-3.16.0/run/shiro.ini
16:41:52 INFO Config :: Template file: templates/config-tdb2-dir
16:41:52 INFO Config :: TDB dataset: directory=/Users/wf/Documents/py-workspace/DgraphAndWeaviateTest/data
I took the endpoint info from http://localhost:3030/dataset.html which states that SPARQL Query would be at: /example/query and SPARQL Update at /example/update
Thus in the unit test:
def getJena(self,mode='query'):
endpoint="http://localhost:3030/example/%s" % mode
jena=Jena(endpoint)
return jena
text/html;charset=utf-8
comes from somewhere but I can't make it happen with Fuseki 3.16.0.
Even static HTML pages come back Content-Type: text/html
-- no charset
.
What does the Fuseki server log file contain? If there are no entries for the request, then the request didn't reach Fuseki.
@afs The log has:
18:57:10 INFO Fuseki :: [60] POST http://localhost:3030/example/update
18:57:10 INFO Fuseki :: [60] 200 OK (7 ms)
I tried a curl request
curl --data-binary @insert.txt http://localhost:3030/example/update
Error 400: Bad Request
with insert.txt=
PREFIX cr: <http://cr.bitplan.com/>
INSERT DATA {
cr:version cr:author "Wolfgang Fahl".
}
but get a Error 400: Bad Request.
I tried debugging the http call in my Python IDE. But due to all the abstraction layers it's pretty hard.
full_url is: str: http://localhost:3030/example/update
'Content-type' (4341599984) str: application/x-www-form-urlencoded
'User-agent' (4341609136) str: sparqlwrapper 1.8.5 (rdflib.github.io/sparqlwrapper)
'Accept' (4341610160) str: application/sparql-results+json,application/json,text/javascript,application/javascript
data: bytes: b'update=%0A++++++++PREFIX+cr%3A+%3Chttp%3A//cr.bitplan.com/%3E%0A++++++++INSERT+DATA+%7B+%0A++++++++++cr%3Aversion+cr%3Aauthor+%22Wolfgang+Fahl%22.+%0A++++++++%7D%0A++++++++'
The raw response is: headers HTTPMessage: Connection: close\nDate: Sat, 15 Aug 2020 17:06:28 GMT\nFuseki-Request-ID: 63\nContent-Type: text/html;charset=utf-8\n\n
The SPARQLWrapper code then expects a content-type of XML, JSON, RDF/XML, N3, CSV, JSON-LD and since none of these is found a warning is issued (unfortunately unconditionally - since the call is ok it would be sufficient to ignore the problem or just look for the html content having the success message).
re: curl --data-binary @insert.txt
-- this sends the content in the body -- you need to set the content type.
curl -v -g --header 'Content-type: application/sparql-update' --data-binary 'INSERT DATA{}' http://localhost:3030/example/update
else it is application/x-www-form-urlencoded
, in which case you need "update=" in insert.txt
.
curl -v -g -d'update=INSERT DATA{}' http://localhost:3030/example/update
The Fuseki response to application/x-www-form-urlencoded` is an HTML page -- i.e. something displayable -- which is reasonable because it was sent an HTML form.
If the expected content is an RDF format, then it look like the client code is processing it more like a query.
Content-type: application/sparql-update
with the body holding the update request in UTF-8 by setting the Content-type. This is the better way - HTML forms have size limitations in practice.For an update response - only the status code is needed - an application can ignore the response body (but it must consume the bytes to preserve connection caching). The "readthedocs" reference looks right.
@afs, @dayures thank you for your effort which lead to finding out how to do things:
self.sparql.setRequestMethod(POSTDIRECTLY)
is the key to properly handling updates. The documentation might want to more prominently point this out. E.g. there is no example in the scripts directory showing the usage.
'''
Created on 2020-08-14
@author: wf
'''
from SPARQLWrapper import SPARQLWrapper, JSON
from SPARQLWrapper.Wrapper import POSTDIRECTLY, POST
class Jena(object):
'''
wrapper for apache Jana
'''
def __init__(self,url,mode='query',returnFormat=JSON):
'''
Constructor
'''
self.url="url%s" % (mode)
self.mode=mode
self.sparql=SPARQLWrapper(url,returnFormat=returnFormat)
def rawQuery(self,queryString,method='POST'):
'''
query with the given query string
'''
self.sparql.setQuery(queryString)
self.sparql.method=method
queryResult = self.sparql.query()
return queryResult
def getResults(self,jsonResult):
'''
get the result from the given jsonResult
'''
return jsonResult["results"]["bindings"]
def insert(self,insertCommand):
'''
run an insert
'''
self.sparql.setRequestMethod(POSTDIRECTLY)
response=self.rawQuery(insertCommand, method=POST)
return response
def query(self,queryString,method=POST):
'''
get a list of results for the given query
'''
queryResult=self.rawQuery(queryString,method=method)
jsonResult=queryResult.convert()
return self.getResults(jsonResult)
It would be better if the update read the response body and threw it away.
For example, it may be a parse error and there is an error message but even for a zero length body, it is better to read it and hit the end of stream.
For any HTTP usage, if the caller does not read all of the response body, the connection can not be reused for another request because to the HTTP code it looks like it is still in use. For a few requests this may not matter in the client, though it is unhelpful in the server and may impact other clients. It is slower to open an TCP connection for every request.
(This is not specific to the SPARQL protocol - it applies to all HTTP usage.)
@afs - i think i get an empty response in case of success and an exception if case of error that's what http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest#Apache_unit_test now tests. How come you assume the body is not read?
Because I can't see a line that does it (not that I know SPARQLWrapper but there was an SO quertion a while back that came down to holding connections open and eventually the server ran out of serving threads.
response=self.rawQuery(insertCommand, method=POST)
return response
"response" is "queryResult" -- the document has results.response.read()
(different 'response').
This will not show up in a unit test. I don't know if reading the header and status code also causes reading the whole of the body (which would be non-streaming).
@afs - thanks for the hint - i added a dummy line. Now I am stuck at https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki
@afs @dayures The result of all this is an extension of the SPARQLWrapper at https://github.com/WolfgangFahl/DgraphAndWeaviateTest - for this issue only the documentation part is open. I am going to open a new issue regading the ListOfDict conversion
With Jena 4.9.0 and SparqlWrapper2.0 i now get Exception: HTTP Error 415: Unsupported Media Type
Jena does not return the message "Unsupported Media Type". The 415 cases have a different message.
"Unsupported Media Type" is the generic error message so it is not clear the operation is going to Fuseki at all.
Check the server log.
If the operation gets there it is logged with an error message. You call also run it "-v" to get a detailed HTTP request log.
In case it is calling Fuseki and the error message wasn't available (this happens with HTTP/2 - the server log has the correct error message in it):
If it is to a update specific endpoint (.../update
) - there is a Content-type but it's not right for update. The correct MIME type is "application/sparql-update", or an HTML form that includes "request=" (see above https://github.com/RDFLib/sparqlwrapper/issues/159#issuecomment-674427602).
@afs Thank you for the swift response! Jena 4.10 needs a --update on start and has different endpoints for update and query. Thanks to Tim Holzheim for finding these details and changing our test setup accordingly.
I get the error message:
when running the unit test below. I found
which both do not explain the reason for the problem and e.g. how to work around it. I assume the content-type for Apache Jena must be different. How could i set it?
python unit test
jena.py helper module