eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
361 stars 162 forks source link

Workbench does not support unicode in SPARQL queries #3659

Open jakubklimek opened 2 years ago

jakubklimek commented 2 years ago

Current Behavior

When entering a SPARQL query in Workbench (3.7.4 and 4.0.0M2) running on localhost tomcat instance (9.0.56, JDK 17.0.1), with unicode characters in it, e.g.:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dqv: <http://www.w3.org/ns/dqv#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#>

PREFIX : <https://data.gov.cz/zdroj/datová-kvalita/metriky/>

SELECT ?jméno_poskytovatele ?početSPARQL ?početDCATAPDokumenty ?početCKAN ?početFormulář
WHERE {
  [] <urn:AktuálníDatumIRI> ?period .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZLKODCKAN ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početCKAN .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZLKODDokumentyDCAT ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početDCATAPDokumenty .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZLKODSPARQL ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početSPARQL .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZFormuláře ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početFormulář .

  # hack na zrychlení v rdf4j
  OPTIONAL {?poskytovatel foaf:name ?jméno_poskytovatele . }
  FILTER(BOUND(?jméno_poskytovatele))  
}
ORDER BY ?jméno_poskytovatele

I first get the info that this query will be POSTed, and then a lexical error happens, and the query gets rewritten to:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dqv: <http://www.w3.org/ns/dqv#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#>

PREFIX : <https://data.gov.cz/zdroj/datová-kvalita/metriky/>

SELECT ?jméno_poskytovatele ?početSPARQL ?početDCATAPDokumenty ?početCKAN ?početFormulář
WHERE {
  [] <urn:AktuálníDatumIRI> ?period .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZLKODCKAN ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početCKAN .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZLKODDokumentyDCAT ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početDCATAPDokumenty .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZLKODSPARQL ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početSPARQL .

  [] a qb:Observation, dqv:QualityMeasurement ;
    dqv:isMeasurementOf :PočetDatovýchSadZFormuláře ;
    sdmx-dimension:refPeriod ?period ;
    dqv:computedOn ?poskytovatel ;
    dqv:value ?početFormulář .

  # hack na zrychlení v rdf4j
  OPTIONAL {?poskytovatel foaf:name ?jméno_poskytovatele . }
  FILTER(BOUND(?jméno_poskytovatele))  
}
ORDER BY ?jméno_poskytovatele

Expected Behavior

The query should execute correctly and the encoding should not be mangled.

Steps To Reproduce

No response

Version

3.7.4, 4.0.0M2

Are you interested in contributing a solution yourself?

No

Anything else?

No response

abrokenjester commented 2 years ago

For what it's worth in the RDF4J documentation for the Workbench this is mentioned, and it suggests a workaround is to reconfigure Tomcat (see https://rdf4j.org/documentation/tools/server-workbench/#configuring-rdf4j-workbench-for-utf-8-support ).

In short: uncomment the setCharacterEncodingFilter filter in conf/web.xml of your Tomcat installation, then restart Tomcat.

abrokenjester commented 2 years ago

~I also note that I cannot reproduce the issue, locally.~ EDIT: managed to reproduce now on a locally running Tomcat (8.5). Earlier attempt was using a recent docker image. And our docker image applies the suggested fix for POST requests in the Tomcat config.

abrokenjester commented 2 years ago

The core of the problem seems to be in how the jQuery frontend communicates to the Workbench servlet. It uses standard form encoding to submit POST requests, and Tomcat by default uses iso-8859-1 for handling form-encoded data. The workaround suggested in the documentation is to tweak Tomcat to use UTF-8 for form-encoded data, but perhaps we should look into something where we have a little more control ourselves over the chosen character encoding.