SWI-Prolog / packages-semweb

The SWI-Prolog RDF store
28 stars 14 forks source link

sparql_read_xml_result/2 has "space(remove)", which removes whitespace (including newlines) from literals #99

Closed koo5 closed 3 years ago

koo5 commented 3 years ago

this causes the returned data to be different than what was sent by the server, where there is a triple with a literal that contains newlines or multiple consecutive spaces.

koo5 commented 3 years ago

i use sudo tcpdump -A -s 0 'host dbpedia.org to observe the traffic. here is, finally, a self-contained example: https://github.com/koo5/hackery2/blob/master/src/data/swipl/sparql/bug2.pl you can see that the literal is returned without newlines, unless you remove application/sparql-results+xml from accepted headers

koo5 commented 3 years ago

here's a query that will get you more literals with newlines:

select distinct ?s ?o where {
  ?s rdfs:comment ?o.
  filter contains(?o,"\n")
}
limit 100
JanWielemaker commented 3 years ago

Thanks. Code I can run saves a lot of time finding a buggy triple or writing a turtle file, loading it into a server, etc. Should be fixed with ef7ee735df9445b061cecc23c5d707d51e470edd. There is no test suite for the sparql client, so I hope I now properly skip all white space in the rest of the Prolog code now that the XML parser no longer does so. The provided two tests work.

koo5 commented 3 years ago

Thanks a lot. I should test it, eventually.