dice-group / gerbil

GERBIL - General Entity annotatoR Benchmark
GNU Affero General Public License v3.0
224 stars 58 forks source link

Replace a time-consuming lookup for the ERD dataset #456

Open MichaelRoeder opened 3 months ago

MichaelRoeder commented 3 months ago

Problem

At the moment, the ERD dataset has a time-consuming lookup operation to transform freebase IRIs to DBpedia IRIs. See https://github.com/dice-group/gerbil/blob/master/src/main/java/org/aksw/gerbil/dataset/impl/erd/ERDDataset2.java#L100

Solution

There are two possible solutions:

  1. The easiest solution would be to run the lookup once and store the dataset with the retrieved DBpedia IRIs (e.g., as NIF file).
  2. Make use of a sameAs lookup service