Closed fsasaki closed 9 years ago
Looks similiar to #46
Thanks for reporting! Fixed with https://github.com/freme-project/freme-ner/commit/eae252d765f680c7622dcda953e6cf69371cd3ac
Note that the entity spotting models were trained and work well on "clean" content - content without markup, no many blank spaces between the tokens, etc. We assume that FREME NER clients clean their content before they submit it to FREME NER. This is already understood and considered by Wripl - the clean the content before it is submitted to FREME NER.
Great, thank you! About the clean up: that is OK in the request2, since the content type is text/html. That evokes e-Internationalisation (Okapi) and in that way FREME-NER receives the clean content. I'll then close this bug.
I submitted the following sentence two FREME NER
Hundreds of thousands of migrants, many from Syria, Africa and Afghanistan, have been making their way from Turkey to the Balkans in recent months, in a bid to reach Germany, Sweden and other EU states.
The first request (see attachment request1.txt) had this sentence as part of an HTML file taken from http://www.bbc.com/news/world-europe-34576045 . Here FREME NER recognized only three entities in the whole text and none in above sentence, see out1.txt
The same sentence submitted without other content to FREME NER leads to many more entities recognized. See the attachents for the request and output. request1.txt out2.txt request2.txt out1.txt