Open m1ci opened 8 years ago
Hi Milan, doing only this will not solve the problem because disambiguation pages are nothing but a collection of pages. Even if the disambiguation page is removed, such wrongly spotted entities will get linked to other URIs.
For example, NOT (if it is recognised as an entity mention by the NER layer) will get linked to http://dbpedia.org/resource/Inverter_(logic_gate) because there is a mapping between NOT and that URI.
I think that an entity linked to a disambiguation page is wrong with 100% probability. When FREME NER has to choose another link then disambiguation can or cannot work well. But at least there is a change for success. So even if that does not solve the problem of Not being detected as an entity, it might improve FREME NER performance on the dbpedia dataset.
But please dont implement this task now. It is just an idea that should not get lost. I am sure we have many ways to improve FREME NER and we should implement the most promising improvements first.
this is not a critical issue for FREME 1.0 and will be left open for future development.
Currently, we index any label-URI pairs, however some pairs point to disambiguation pages. This issue https://github.com/freme-project/e-Entity/issues/49 a results from this. We need to re-index DBpedias by removing the disambiguation pages.
To distinguish whether URL is disambiguation page or not we can use the DBpedia disambiguation pages dataset http://downloads.dbpedia.org/2015-04/core/disambiguations_en.nt.bz2