Closed jnehring closed 8 years ago
Thanks Jan,
I have added the file of examples to Gdrive - https://drive.google.com/open?id=0B1v6TnDXhoIbVXVEMHE1ZnliQW8
Let me know if you need anything else?
How about excluding disambiguation pages in general from the named entity detection?
Nice catch. Our training data contains surface forms pointing to disambiguation. We will remove such cases from the training data and re-index DBpedia.
I just dont know how to identify a page as disambiguation page. Maybe disambiguation pages have the property dbo:wikiPageDisambiguates?
There is DBpedia partition dataset contaning only disambiguation pages http://downloads.dbpedia.org/2015-04/core/disambiguations_en.nt.bz2
We will use it to clean our training data. Note that, this dataset is not 100% valid. Its created based on heuristics since Wikipedia has no syntax to distinguish disambiguation links from ordinary links. But IMO it is of enough good for our case.
I close this issue because it will be solved be freme-project/freme-ner#34
yes, "NOT" will not be linked to http://dbpedia.org/resource/Not, but still "NOT" will be spotted as entity.
Not is detected as entity. E.g. this call
produces this NIF:
When I look at http://dbpedia.org/page/Not and the corresponding wikipedia page http://en.wikipedia.org/wiki/Not then I wonder what kind of named entity this is. It seems that Not is a disambiguation page.
How about excluding disambiguation pages in general from the named entity detection? I think they will produce bad entities in every case.
I just dont know how to identify a page as disambiguation page. Maybe disambiguation pages have the property
dbo:wikiPageDisambiguates
?