Closed ghsnd closed 8 years ago
Your questions:
- Should data in the
<title>
section appear in the input for NER?- Should text separated by a few newlines (\nOrigins\n \nOrigins\n \n\n \nLeipzig in this case) be detectable as one entity?
...relate to e-Internationalization - @katia-vistatec . Maybe I can react on 2) The text is quite irregular, FREME NER was trained on "normal" texts.
Hi. The title appears in the nif file because it is a text unit just like the text in paragraphs or headings. I don't know if there is some reason to have it in the nif.
Thanks for the answers. I think this is more a philosophical discussion than a technical one, so let's close it.
Hi,
Not entirely sure if this issue is a real issue, and if it belongs to FREME NER or e-Internationalization...
I have an HTML page here: history.html.txt.
It starts with
Apply FREME NER (dev) with the following command:
Having the word Origins in the
<title>
and a<h1>
, it produces a.o. this in the output:which, IMO, seems good (correct me if I'm wrong here), though the first Origins comes from the HTML title, which is only displayed as title of the browser window.
FREME NER detects Origins Origins Leipzig as an entity:
And what I actually expect is that Leipzig is detected as an entity.
So my questions are:
<title>
section appear in the input for NER?