freme-project / freme-ner

Apache License 2.0
6 stars 1 forks source link

re-index DBpedias by not indexing disambiguation pages #34

Open m1ci opened 8 years ago

m1ci commented 8 years ago

Currently, we index any label-URI pairs, however some pairs point to disambiguation pages. This issue https://github.com/freme-project/e-Entity/issues/49 a results from this. We need to re-index DBpedias by removing the disambiguation pages.

To distinguish whether URL is disambiguation page or not we can use the DBpedia disambiguation pages dataset http://downloads.dbpedia.org/2015-04/core/disambiguations_en.nt.bz2

nilesh-c commented 8 years ago

Hi Milan, doing only this will not solve the problem because disambiguation pages are nothing but a collection of pages. Even if the disambiguation page is removed, such wrongly spotted entities will get linked to other URIs.

For example, NOT (if it is recognised as an entity mention by the NER layer) will get linked to http://dbpedia.org/resource/Inverter_(logic_gate) because there is a mapping between NOT and that URI.

jnehring commented 8 years ago

I think that an entity linked to a disambiguation page is wrong with 100% probability. When FREME NER has to choose another link then disambiguation can or cannot work well. But at least there is a change for success. So even if that does not solve the problem of Not being detected as an entity, it might improve FREME NER performance on the dbpedia dataset.

But please dont implement this task now. It is just an idea that should not get lost. I am sure we have many ways to improve FREME NER and we should implement the most promising improvements first.

m1ci commented 7 years ago

this is not a critical issue for FREME 1.0 and will be left open for future development.