freme-project / e-services

Apache License 2.0
1 stars 1 forks source link

[e-link] special characters #33

Closed jnehring closed 7 years ago

jnehring commented 7 years ago

We have a lot of error messages in the log files because of special characters in e-Link, e.g.

ERROR   2016-08-30 11:04:04,688 [http-nio-8006-exec-5] eu.freme.eservices.elink.api.DataEnricher  - eu.freme.common.exception.InternalServerErrorException: Could not process the enrichment result from the endpoint=http://rv2622.1blu.de:8890/sparql/ executing the query=CONSTRUCT {  <http://dbpedia.org/resource/America³> <http://www.w3.org/2000/01/rdf-schema#label> ?o . }   WHERE {  <http://dbpedia.org/resource/America³> <http://www.w3.org/2000/01/rdf-schema#label> ?o .  FILTER langMatches( lang(?o), "en" ) }. Error message: [line: 3, col: 16] Unknown char: ³(179;0x00B3)
        at eu.freme.eservices.elink.api.DataEnricher.enrichWithTemplateSPARQL(DataEnricher.java:128)

The problem is that the resource URL is not a valid. The character ³ needs to be URL encoded. Therefore Jena fails.

I played around with the dbpedia sparql endoint and I was not able to formulate a sparql query that fetches any information about http://dbpedia.org/resource/America³

What do we do? Can we fix it?

cURL to reproduce the issue:

curl -X POST -H "Cache-Control: no-cache" -H "Postman-Token: 699d65ee-65f7-2820-aebc-d3ea8d0d020c" -d '@prefix dc:    <http://purl.org/dc/elements/1.1/> .
@prefix prov:  <http://www.w3.org/ns/prov#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .

<http://127.0.0.1:9995/spotlight#char=5,7> itsrdf:taIdentRef <http://dbpedia.org/resource/Germany> .

<http://freme-project.eu/#char=10,17> itsrdf:taIdentRef <http://dbpedia.org/resource/America³>' "http://api.freme-project.eu/current/e-link/documents?outformat=turtle&templateid=4496&informat=turtle"
m1ci commented 7 years ago

I played around with the dbpedia sparql endoint and I was not able to formulate a sparql query that fetches any information about http://dbpedia.org/resource/America³

to me it works, try http://dbpedia.org/sparql/?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+*%0D%0Awhere+%7B%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FAmerica%C2%B3%3E+%3Fp+%3Fo+.%0D%0A%7D&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on

the sparql query is

select *
where {
<http://dbpedia.org/resource/America³> ?p ?o .
}
jnehring commented 7 years ago

It does not work in the SNORQL endpoint: http://dbpedia.org/snorql/?query=select+*%0D%0Awhere+%7B%0D%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FAmerica%C2%B3%3E+%3Fp+%3Fo+.%0D%0A%7D

But anyways, it is more important to deal with it in e-Link.

m1ci commented 7 years ago

@jnehring is there template that uses the snorql dbpedia interface?

jnehring commented 7 years ago

According to http://api.freme-project.eu/current/e-link/templates there is no template that uses snorql dbpedia.

From the error message in the first comment I conclude that the error comes from this template: http://{{baseUrl}}/e-link/templates/4496

This is used by WRIPL and this explains also why the error occurs so often.

m1ci commented 7 years ago

as for the error:

try

curl -v -H "Content-Type: text/turtle" -H "Accept: text/turtle" "http://api.freme-project.eu/current/e-link/documents?templateid=4496" -d @data.txt

with data.txt

There are two http://dbpedia.org/resource/America³ entities

to me this works.

jnehring commented 7 years ago

I will test this.

jnehring commented 7 years ago

The CURL request works now. The error message did not appear in the log file for two weeks now. So this issue is fixed.