facebookresearch / GENRE

Autoregressive Entity Retrieval
Other
763 stars 102 forks source link

alignment between candidate and KILT wikipedia data source #85

Closed cnut1648 closed 2 years ago

cnut1648 commented 2 years ago

Hello, I'm working on the Entity Disambiguation task. According to the documentation, GENRE Trie seems to be made from BLINK's Wikipedia dump which is what KILT uses. I also think GENRE is using KILT formatted data, meaning that candidates & entities in those datasets follow KILT data source entities. Is that correct? I found that sometimes candidates given by GENRE (eg "German Brazilian") do not appear in KILT data source. I wonder if there exists a mapping between GENRE outcome to KILT data source entities. This will be helpful for us to build a page that links to the actual wikipedia id (since KILT has it). Thanks!

nicola-decao commented 2 years ago

Can you give me an example of code where this happens?

The intention is that GENRE always outputs a valid page name from KILT. The constrained beam search should make that happen.

cnut1648 commented 2 years ago

Hi @nicola-decao, for example, consider the first line of aida-train-kilt.jsonl (downloaded from your script), some of the candidates are "Ethnic Germans," "Canadians of German ethnicity," "German American."

If you search these three candidates in KILT (kilt_knowledgesource.json downloaded from official KILT repo) I don't think there is any entry with wikipedia_title equal to "Ethnic Germans" or "Canadians of German ethnicity" or "German American." (I think they should instead be "German ethnic," "German Canadians," and "German Americans.")

There are also instances whose labels are not in the KILT, for example aida-train-kilt.jsonl line 147 ('Channel 2 (Israel)')

Please let me know if I misunderstand anything. Thanks!

nicola-decao commented 2 years ago

OK make sense now! AIDA uses YAGO as a knowledge base so there is no alignment with Wikipedia. aida-train-kilt.jsonl contains standard candidates that most researchers are using to evaluate EL models.

cnut1648 commented 2 years ago

Oh I see, from the name I mistakenly thought candidates from aida-train-kilt.jsonl's candidates align with KILT. Thanks for the clarification.