freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

"|" Spotted as entity #169

Open x-fran opened 7 years ago

x-fran commented 7 years ago

Text file: freme.txt

cUrl:

curl -X POST --header 'Content-Type: text/plain' --header 'Accept: text/turtle' -d @freme.txt 'https://api.freme-project.eu/current/e-entity/freme-ner/documents?language=en&dataset=dbpedia&mode=all&nif-version=2.1' >> freme_out.txt

The issue:


<http://freme-project.eu/#offset_1490_1491>
        a                     nif:OffsetBasedString , nif:Phrase ;
        nif:anchorOf          "|"^^xsd:string ;
        nif:annotationUnit    [ a                       nif:EntityOccurrence ;
                                nif:taMsClassRef        <http://dbpedia.org/ontology/Country> ;
                                itsrdf:taAnnotatorsRef  <http://freme-project.eu/tools/freme-ner> ;
                                itsrdf:taClassRef       <http://dbpedia.org/ontology/Place> , <http://dbpedia.org/ontology/PopulatedPlace> , <http://nerd.eurecom.fr/ontology#Organization> , <http://dbpedia.org/ontology/Country> , <http://dbpedia.org/ontology/Location> ;
                                itsrdf:taConfidence     "0.3828838465354498"^^xsd:double ;
                                itsrdf:taIdentRef       <http://dbpedia.org/resource/United_States>
                              ] ;
        nif:beginIndex        "1490"^^xsd:nonNegativeInteger ;
        nif:endIndex          "1491"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://freme-project.eu/#offset_0_1529> .
m1ci commented 7 years ago

this happens due to the unclean nature of the content. FREME NER expects "clean" content with text with regular sentences (to some extent).

Leaving the issue open for possible further developments.