dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
222 stars 58 forks source link

AQUAINT - Evaluation Corpus #89

Closed dav009 closed 9 years ago

dav009 commented 9 years ago

Not sure if an issue, Just want to be clear how was the AQUAINT corpus processed for evaluation.

For example some of the annotated entities have become disambiguation or have become redirects. i.e: Leninsk-Kuznetsky. So I wonder if some kind of pre-processing was done on the corpus such to assure that those cases are not hindering some of the evaluations. As for example some systems will return identifiers based on more recent version of canonical wikipedia identifiers.

Thanks

MichaelRoeder commented 9 years ago

Hi David Przybilla,

At the moment (versions 1.1.X) we are doing no preprocessing of the corpora and are following the behaviour of the BAT-framework. Thus, the system sticks to the current version of the Wikipedia to check whether entities are existing or not. If they are identified as redirects, the correct Wikipedia ID is resolved. If they are not existing, they are not part of the benchmark.

This Wikipedia ID dependency will be gone with version 1.2.0 of GERBIL as we want to stick to URIs as mentioned in the GERBIL publication. Currently, there are no plans of 1) introducing an automatic check for URI existence or 2) updating all datasets to the latest DBpedia version. But this is still under discussion.

Does that answer your question?

Cheers, Michael Röder

dav009 commented 9 years ago

Hi Michael,

Definitely it does. Thanks for the quick and clear response.