Open bjdmeest opened 8 years ago
This may be an aspect of the underlying e-Internationalisation service. Adding @borriellom to see what she thinks. If you want to keep line breaks in <p> html element adding a <br> element may help.
The problem is not HTML specific, the result is the same for a pure textual input as well. Greater Athens is detected, which is good, but the nif:anchorOf does not match with the original text.
For the Athenians the most popular way of dividing the City proper is through its neighbourhoods such as Pagkrati, Ambelokipi, Exarcheia, Patissia, Ilissia, Petralona, Koukaki and Kypseli, each with its own distinct history and characteristics.
The Athens municipality also forms the core and center of Greater
Athens which consists of the Athens municipality and 34 more
municipalities, which are divided in the four regional units (North,
West, Central and South Athens) mentioned above.
Thanks for pointing this out, @bjdmeest , so this is indeed a different issue.
The same problem occurs in DBPedia Spotlight.
@bjdmeest this is related to https://github.com/freme-project/freme-ner/issues/59 See the discussion and the solution in https://github.com/freme-project/freme-ner/issues/59
If that does not solve the issue, feel free to reopen it so we can further investigate.
Actually, the curl below (so without --data or --data-binary) also does not return the correct result (i.e., nif:anchorOf "Greater Athens"^^xsd:string ;
instead of nif:anchorOf """Greater
Athens"""^^xsd:string ;
).
curl -X POST --header "Content-Type: text/html" --header "Accept: text/n3" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?input=The%20Athens%20municipality%20also%20forms%20the%20core%20and%20center%20of%20Greater%0AAthens%20which%20consists%20of%20the%20Athens%20municipality%20and%2034%20more%0Amunicipalities&informat=text&outformat=turtle&language=en&dataset=dbpedia&mode=all"
Neither does the ajax request below
$.ajax('http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all',
{
method: 'POST',
headers: {
'Content-Type': 'text/html'
},
data: '<p>The Athens municipality also forms the core and center of Greater\nAthens which consists of the Athens municipality and 34 more\nmunicipalities</p>',
success: function (data) {
console.log(data)
},
crossDomain: true
})
Thanks we will investigate this and get back to you.
@sandroacoelho can you look at it? See the explanation bellow
Following request:
curl -v "http://rv2622.1blu.de:8081/api/entities?format=TTL&language=en&dataset=dbpedia" --data-binary @doc.txt where the document is doc.txt
In the results we get nif:anchorOf "Greater Athens"^^xsd:string ;
but it should be nif:anchorOf "Greater\nAthens"^^xsd:string ;
When I do following request with the HTML below, the whitespace of the anchorOf values of the returning NIF is incorrect, e.g., instead of nif:anchorOf
Greater \nAthens
, nif:anchorOfGreater Athens
is returned.curl -X POST --header "Content-Type: text/html" --header "Accept: application/ld+json" -d "in.html" "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all"