SWI-Prolog / packages-semweb

The SWI-Prolog RDF store
28 stars 14 forks source link

`rdf_load/[1,2]' changes the request IRI the user supplies #20

Open wouterbeek opened 8 years ago

wouterbeek commented 8 years ago

rdf_load/[1,2] performs IRI normalization before sending an HTTP request. IRI normalization introduces unnecessary percent escaping that is not supported by all servers, occasionally resulting in unsuccessful requests.

Reproducible case:

?- [library(semweb/rdf_db)].
?- [library(semweb/rdf_http_plugin)].
?- rdf_load('http://dbpedia.org/resource/Category:Politics').
% Parsed "http://dbpedia.org/resource/Category%3APolitics" in 0.00 sec; 0 triples
true.

If you visit http://dbpedia.org/resource/Category:Politics then you see that there are triples there.

JanWielemaker commented 8 years ago

Great. In a previous rounds, we decided that : must be escaped to avoid relative URIs to be read as absolute ones. The above makes it really hard when you can/must escape. rdf_load escapes to allow it processing the unescaped IRIs on the triples ...

wouterbeek commented 8 years ago

I'm not clear on the benefit of escaping : in places where this is not required. The only benefit that I can think of is processing speed, since the syntax for relative IRIs is recognizably different than the one for absolute IRIs.

JanWielemaker commented 8 years ago

It is rather odd. RFC3986 indeed allows for ":" in a path segment. However, if you have a relative url, using a ":" in (the first) path segment causes it to become ambiguous (it can also be read as an absolute url). This problem was raised by Samer a while ago and caused the decision to escape the ":". Looking at JavaScript, we get

> encodeURIComponent("aap:noot")
"aap%3Anoot"
> encodeURI("http://www.example.com/aap:noot")
"http://www.example.com/aap:noot"

I'm a little lost :(