freme-project / e-Entity

Apache License 2.0
1 stars 1 forks source link

FREME NER - Special characters encoding in taIdentRef URI #56

Closed borriellom closed 8 years ago

borriellom commented 8 years ago

Sometimes entities URIs contain special characters. Such characters are encoded in the URI included in the NIF file and I’m not so sure that the encoding is correct as when I try to open the linked page, the link doesn’t work.

HTTP request

curl -X POST -H "Content-Type: " http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?input=This%20meant%20the%20Aos%20S%C3%AD%20(pronounced%20ees%20shee)%2C%20the%20'spirits'%20or%20'fairies'%2C%20could%20more%20easily%20come%20into%20our%20world%20and%20were%20particularly%20active.&informat=text&outformat=turtle&language=en&dataset=dbpedia&mode=all”
@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc:   <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .

<http://freme-project.eu/#char=0,140>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "140"^^xsd:int ;
        nif:isString    "This meant the Aos Sí (pronounced ees shee), the 'spirits' or 'fairies', could more easily come into our world and were particularly active."^^xsd:string .

<http://freme-project.eu/#char=15,21>
        a                     nif:RFC5147String , nif:String , nif:Word , nif:Phrase ;
        nif:anchorOf          "Aos Sí"^^xsd:string ;
        nif:beginIndex        "15"^^xsd:int ;
        nif:endIndex          "21"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,140> ;
        itsrdf:taClassRef     <http://www.w3.org/2002/07/owl#Thing> ;
        itsrdf:taConfidence   "0.4859081423223223"^^xsd:double ;
        itsrdf:taIdentRef     <http://dbpedia.org/resource/Aos_S%25C3%25AD> .

The link http://dbpedia.org/resource/Aos_S%25C3%25AD doesn’t work. The correct encoding should be http://dbpedia.org/page/Aos_S%C3%AD

m1ci commented 8 years ago

The link http://dbpedia.org/resource/Aos_S%25C3%25AD doesn’t work.

It works, only you need to decode to http://dbpedia.org/resource/Aos_Sí.

The correct encoding should be http://dbpedia.org/page/Aos_S%C3%AD

No, http://dbpedia.org/page/Aos_S%C3%AD is HTML representation of the resource, and it is "informative" resource. The http://dbpedia.org/resource/Aos_S%C3%AD is the correct resource, and that is "non-informative" resource. If you try to dereference http://dbpedia.org/resource/Aos_Sí and ask for HTML, you'll get "informative" resource describing the resource in HTML. If you ask for Turtle, you'll get "informative" resource in Turtle.

for HTML try following and look at the Location header:

curl -v http://dbpedia.org/resource/Aos_Sí -H "Accept: text/html"

for Turtle try following and look at the Location header:

curl -v http://dbpedia.org/resource/Aos_Sí -H "Accept: text/turtle"

For more on dereferencing HTTP URIs and informative and non-informative resource, read http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/#Terminology (section Dereferencing HTTP URIs).

borriellom commented 8 years ago

Thanks for the explanation. So this is not an actual bug, is it?

m1ci commented 8 years ago

IMO, not.

jnehring commented 8 years ago

Ups sorry I labeled this wrong with error. I suggest to label it as invalid and close the issue.

m1ci commented 8 years ago

makes sense to me.