freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

"•" Dot spotted as entity #168

Open x-fran opened 7 years ago

x-fran commented 7 years ago

File: freme.txt

cUrl:

curl -X POST --header 'Content-Type: text/plain' --header 'Accept: text/turtle' -d @freme.txt 'https://api.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&nif-version=2.1' >> freme_out.txt

The problem


<http://freme-project.eu/#offset_1415_1416>
        a                     nif:OffsetBasedString , nif:Phrase ;
        nif:anchorOf          "•"^^xsd:string ;
        nif:annotationUnit    [ a                       nif:EntityOccurrence ;
                                nif:taMsClassRef        <http://dbpedia.org/ontology/Country> ;
                                itsrdf:taAnnotatorsRef  <http://freme-project.eu/tools/freme-ner> ;
                                itsrdf:taClassRef       <http://dbpedia.org/ontology/Country> , <http://www.w3.org/2002/07/owl#Thing> , <http://dbpedia.org/ontology/Location> , <http://dbpedia.org/ontology/PopulatedPlace> , <http://dbpedia.org/ontology/Place> ;
                                itsrdf:taConfidence     "0.3195015578351836"^^xsd:double ;
                                itsrdf:taIdentRef       <http://dbpedia.org/resource/United_States>
                              ] ;
        nif:beginIndex        "1415"^^xsd:nonNegativeInteger ;
        nif:endIndex          "1416"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_1570> .
m1ci commented 7 years ago

this happens due to the unclean nature of the content. FREME NER expects "clean" content with text with regular sentences (to some extent).

Leaving the issue open for possible further developments.