blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
872 stars 170 forks source link

Incorrect default encoding (ISO-8859-1) assumed when submitting SPARQL query as POST request #224

Open SimonBin opened 2 years ago

SimonBin commented 2 years ago

According to the SPARQL standard https://www.w3.org/TR/sparql11-protocol/#query-via-post-direct

the encoding of the data must be UTF-8

Blazegraph uses the getReader method:

https://github.com/blazegraph/database/blob/bc439f9d6c37bb4a1d33878b2054853714d5d9a9/bigdata-core/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java#L919-L922

which defaults to ISO-8859-1:

https://github.com/apache/tomcat/blob/7c0dd42ac4e9533d73d4ba50791ab2dda9d79760/java/org/apache/coyote/Constants.java#L30

This causes charset to break with the following query:

curl -H "Content-Type: application/sparql-query" -d "SELECT ?x { BIND('Curaçao' As ?x) }" https://query.wikidata.org/sparql

For example, this problem occurs when Jena wants to query Wikidata from a SPARQL SERVICE clause, see https://github.com/apache/jena/issues/1259#issuecomment-1100607544

It is most likely also causing Issue #206