RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 560 forks source link

SPARQLConnector.update should set ContentType charset #2095

Closed robcast closed 2 years ago

robcast commented 2 years ago

I use rdflib to connect to a Blazegraph triplestore using the SPARQLUpdateStore.

When I add data containing non-ASCII characters using addN() it results in garbled data in the triplestore (UTF-8 interpreted as ISO-8859).

The problem seems to be that SPARQLConnector.update() does not set "charset=UTF-8" in the Content-Type header:

https://github.com/RDFLib/rdflib/blob/a70a9c82649f9345c3691d69df6f1108e9e05871/rdflib/plugins/stores/sparqlconnector.py#L170

but it does use UTF-8 in data=query.encode() in line 183.

The problem goes away if I patch the Content-Type to be "application/sparql-update; charset=UTF-8".

I can provide a PR with this one-line change if you like.

aucampia commented 2 years ago

@robcast thanks for raising the issue, a PR will be welcome, but it should include tests. Whether or not you create a PR we will likely include a fix for this in the next release.

robcast commented 2 years ago

I can try to add tests if it's not too hard :-) What kind of tests do you need?

The issue is at the system boundary between rdflib and the triplestore which makes it more complicated to test. Would https://github.com/RDFLib/rdflib/blob/master/test/test_store/test_store_sparqlupdatestore_mock.py be a place for that?

aucampia commented 2 years ago

For the SPARQLConnector/SPARQLUpdatestore test should use some from of mocking, given that the expectation is that something happens on the network layer the best option is to use our http mock as is being done here.

There is already some http mock based tests in test/test_store/test_store_sparqlupdatestore_mock.py but they are a bit dated, though it would be the right file for tests.

aucampia commented 2 years ago

Just a heads up, as there is already a PR waiting for review on test/test_store/test_store_sparqlupdatestore_mock.py (i.e. https://github.com/RDFLib/rdflib/pull/2089) any other change to that file will only be merged after that PR.

Reviews are welcome though.

aucampia commented 2 years ago

Just a heads up, as there is already a PR waiting for review on test/test_store/test_store_sparqlupdatestore_mock.py (i.e. #2089) any other change to that file will only be merged after that PR.

Reviews are welcome though.

2089 has been merged thanks to the review from @gjhiggins - so it is now open for changes.

robcast commented 2 years ago

I added a PR with a test in test_store_sparqlupdatestore_mock.py.

The test is just a copy of the existing test asserting a different part in the request header. It is not minimal and the assertion could be integrated into the existing test.