freme-project / e-Entity

Apache License 2.0
1 stars 1 forks source link

FREME NER is not spotting entities #55

Closed x-fran closed 8 years ago

x-fran commented 8 years ago

This is the file with the content. test.txt

Some information about the file: image

cUrl command

→ curl -v -d @test.txt "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&outformat=json-ld&language=en&dataset=dbpedia&enrichement=dbpedia-categories" -H "Content-Type:"

I tried something similar with the opened issue https://github.com/freme-project/freme-ner/issues/46 by removing the first paragraph from the content and I get one entity only.

{
    "@id" : "http://freme-project.eu/#char=10,17",
    "@type" : [ "nif:Phrase", "nif:Word", "nif:RFC5147String", "nif:String" ],
    "nif:anchorOf" : "Germany",
    "beginIndex" : "10",
    "endIndex" : "17",
    "referenceContext" : "http://freme-project.eu/#char=0,2889",
    "taClassRef" : "http://nerd.eurecom.fr/ontology#Location",
    "itsrdf:taConfidence" : 0.5323128034518319,
    "taIdentRef" : "dbpedia:Germany"
  }

Germany/German is mentioned 13 times in the content so this entity is OK. China/China’s is mentioned 4 times and was not spotted as a entity.

This is the link from where I get the content. http://www.ft.com/intl/cms/s/0/7786936c-73fc-11e5-bdb1-e6e4767162cc.html#axzz3p07brijW

Same content in Opencalais image

borriellom commented 8 years ago

I noticed the same issue while I was testing FREME NER by using very simple text. For example no entities are spot in the text Welcome to Dublin and only Dublin (but not Ireland) is recognized as an entity in the text Dublin is in Ireland. The e-Entity service with DBpedia Spotlight engine properly recognizes all entities.

First curl request

C:\Users\Martab\curl\nossl>curl -X POST --header "Content-Type: "  "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?input=Welcome
%20to%20Dublin&informat=text&outformat=turtle&language=en&dataset=dbpedia"

Response

@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc:   <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .

<http://freme-project.eu/#char=0,17>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "17"^^xsd:int ;
        nif:isString    "Welcome to Dublin"^^xsd:string .

Second curl request

curl -X POST --header "Content-Type: " "http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents?input=Dublin%2
0is%20in%20Ireland&informat=text&outformat=turtle&language=en&dataset=dbpedia"

Response

@prefix dbpedia-fr: <http://fr.dbpedia.org/resource/> .
@prefix dbc:   <http://dbpedia.org/resource/Category:> .
@prefix dbpedia-es: <http://es.dbpedia.org/resource/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia-de: <http://de.dbpedia.org/resource/> .
@prefix dbpedia-ru: <http://ru.dbpedia.org/resource/> .
@prefix freme-onto: <http://freme-project.eu/ns#> .
@prefix dbpedia-nl: <http://nl.dbpedia.org/resource/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dbpedia-it: <http://it.dbpedia.org/resource/> .

<http://freme-project.eu/#char=0,20>
        a               nif:String , nif:Context , nif:RFC5147String ;
        nif:beginIndex  "0"^^xsd:int ;
        nif:endIndex    "20"^^xsd:int ;
        nif:isString    "Dublin is in Ireland"^^xsd:string .

<http://freme-project.eu/#char=0,6>
        a                     nif:RFC5147String , nif:Word , nif:String , nif:Phrase ;
        nif:anchorOf          "Dublin"^^xsd:string ;
        nif:beginIndex        "0"^^xsd:int ;
        nif:endIndex          "6"^^xsd:int ;
        nif:referenceContext  <http://freme-project.eu/#char=0,20> ;
        itsrdf:taClassRef     <http://nerd.eurecom.fr/ontology#Location> ;
        itsrdf:taConfidence   "0.9885517011780088"^^xsd:double ;
        itsrdf:taIdentRef     dbpedia:Dublin .
m1ci commented 8 years ago

only Dublin (but not Ireland) is recognized as an entity in the text Dublin is in Ireland. The e-Entity service with DBpedia Spotlight engine properly recognizes all entities.

This should be solved with https://github.com/freme-project/freme-ner/commit/eae252d765f680c7622dcda953e6cf69371cd3ac I try and both, Dublin and Ireland are spotted. See http://api-dev.freme-project.eu/current/e-entity/freme-ner/documents/?informat=text&input=Dublin%20is%20in%20Ireland.&outformat=turtle&language=en&dataset=dbpedia

m1ci commented 8 years ago

@xFran

Germany/German is mentioned 13 times in the content so this entity is OK. China/China’s is mentioned 4 times and was not spotted as a entity.

This should be fixed with https://github.com/freme-project/freme-ner/commit/eae252d765f680c7622dcda953e6cf69371cd3ac

I tried, and got China spotted.

x-fran commented 8 years ago

I tried, and got China spotted.

How is that? Did you fixed something? My cUrl was/is wrong? Please send me your text document along side with the cUrl request so I can test it. We should close the issue than?

m1ci commented 8 years ago

yes, it was fixed yesterday. That was bug.

x-fran commented 8 years ago

I will test it again and if is ok I will close the issue.