Closed x-fran closed 8 years ago
I noticed the same issue while I was testing FREME NER by using very simple text.
For example no entities are spot in the text Welcome to Dublin
and only Dublin (but not Ireland) is recognized as an entity in the text Dublin is in Ireland
. The e-Entity service with DBpedia Spotlight engine properly recognizes all entities.
First curl request
C:\Users\Martab\curl\nossl>curl -X POST --header "Content-Type: " "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?input=Welcome
%20to%20Dublin&informat=text&outformat=turtle&language=en&dataset=dbpedia"
Response
@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc: <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .
<http://freme-project.eu/#char=0,17>
a nif:String , nif:Context , nif:RFC5147String ;
nif:beginIndex "0"^^xsd:int ;
nif:endIndex "17"^^xsd:int ;
nif:isString "Welcome to Dublin"^^xsd:string .
Second curl request
curl -X POST --header "Content-Type: " "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?input=Dublin%2
0is%20in%20Ireland&informat=text&outformat=turtle&language=en&dataset=dbpedia"
Response
@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc: <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .
<http://freme-project.eu/#char=0,20>
a nif:String , nif:Context , nif:RFC5147String ;
nif:beginIndex "0"^^xsd:int ;
nif:endIndex "20"^^xsd:int ;
nif:isString "Dublin is in Ireland"^^xsd:string .
<http://freme-project.eu/#char=0,6>
a nif:RFC5147String , nif:Word , nif:String , nif:Phrase ;
nif:anchorOf "Dublin"^^xsd:string ;
nif:beginIndex "0"^^xsd:int ;
nif:endIndex "6"^^xsd:int ;
nif:referenceContext <http://freme-project.eu/#char=0,20> ;
itsrdf:taClassRef <http://nerd.eurecom.fr/ontology#Location> ;
itsrdf:taConfidence "0.9885517011780088"^^xsd:double ;
itsrdf:taIdentRef dbpedia:Dublin .
only Dublin (but not Ireland) is recognized as an entity in the text Dublin is in Ireland. The e-Entity service with DBpedia Spotlight engine properly recognizes all entities.
This should be solved with https://github.com/freme-project/freme-ner/commit/eae252d765f680c7622dcda953e6cf69371cd3ac I try and both, Dublin and Ireland are spotted. See http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&input=Dublin%20is%20in%20Ireland.&outformat=turtle&language=en&dataset=dbpedia
@xFran
Germany/German is mentioned 13 times in the content so this entity is OK. China/China’s is mentioned 4 times and was not spotted as a entity.
This should be fixed with https://github.com/freme-project/freme-ner/commit/eae252d765f680c7622dcda953e6cf69371cd3ac
I tried, and got China spotted.
I tried, and got China spotted.
How is that? Did you fixed something? My cUrl was/is wrong? Please send me your text document along side with the cUrl request so I can test it. We should close the issue than?
yes, it was fixed yesterday. That was bug.
I will test it again and if is ok I will close the issue.
This is the file with the content. test.txt
Some information about the file:![image](https://cloud.githubusercontent.com/assets/3188361/10582324/2fb29754-767f-11e5-813e-c28b65a08211.png)
cUrl command
I tried something similar with the opened issue https://github.com/freme-project/freme-ner/issues/46 by removing the first paragraph from the content and I get one entity only.
Germany/German is mentioned 13 times in the content so this entity is OK. China/China’s is mentioned 4 times and was not spotted as a entity.
This is the link from where I get the content. http://www.ft.com/intl/cms/s/0/7786936c-73fc-11e5-bdb1-e6e4767162cc.html#axzz3p07brijW
Same content in Opencalais![image](https://cloud.githubusercontent.com/assets/3188361/10582768/5f7e8950-7681-11e5-9cf0-63b14df1b9aa.png)