dydra / support

4 stars 1 forks source link

Functions ignore variable-bound strings arguments containing accented characters #33

Closed knoan closed 9 years ago

knoan commented 9 years ago

Strings containing unicode diacritics in this test dataset:

prefix : <urn:example:>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

insert data {
    :x rdfs:label "Australian Gift Network, Co".
    :y rdfs:label "Auto Canal+ Petit".
    :z rdfs:label "Auto Associés & Cie.".
}

aren't processed by the following query, returning undefined variables instead of the expected result:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

select ?l (strlen(?l) as ?v) {

  ?i rdfs:label ?l 

}

The problem is shared by any function possibly taking string arguments (tested with str, strlen, ucase and coalesce).

Works as expected on constants:

select (strlen("Auto Associés & Cie." ) as ?v)  {}

or if values are provided through a VALUES clause:

select ?l (strlen(?l) as ?v)  {
    values ?l {
        "Australian Gift Network, Co"    
        "Auto Canal+ Petit"                   
        "Auto Associés & Cie."                
    }
}
lisp commented 9 years ago

i have not managed to reproduce this. that data appears to round-trip intact as:

http://dydra.com/james/test/unicode-labels.html

is there a particular response media type involved? when retrieved via that page, but the json and the rdf encodings looked correct. what am i overlooking?

knoan commented 9 years ago

The issue is with the ?v variable missing the expected value for "Auto Associés & Cie.", as seen also in http://dydra.com/james/test/unicode-labels.html.

Expected:

?l ?v
Australian Gift Network, Co 27
Auto Canal+ Petit 17
Auto Associés & Cie. 21

Actual:

?l ?v
Australian Gift Network, Co 27
Auto Canal+ Petit 17
Auto Associés & Cie.
artob commented 9 years ago

This has now been remedied on dydra.com.

(The issue turned out to be a pathological corner case in a third-party Unicode library we make use of.)