Closed cnut1648 closed 2 years ago
Can you give me an example of code where this happens?
The intention is that GENRE always outputs a valid page name from KILT. The constrained beam search should make that happen.
Hi @nicola-decao, for example, consider the first line of aida-train-kilt.jsonl
(downloaded from your script), some of the candidates are "Ethnic Germans," "Canadians of German ethnicity," "German American."
If you search these three candidates in KILT (kilt_knowledgesource.json
downloaded from official KILT repo) I don't think there is any entry with wikipedia_title
equal to "Ethnic Germans" or "Canadians of German ethnicity" or "German American." (I think they should instead be "German ethnic," "German Canadians," and "German Americans.")
There are also instances whose labels are not in the KILT, for example aida-train-kilt.jsonl
line 147 ('Channel 2 (Israel)')
Please let me know if I misunderstand anything. Thanks!
OK make sense now! AIDA uses YAGO as a knowledge base so there is no alignment with Wikipedia. aida-train-kilt.jsonl
contains standard candidates that most researchers are using to evaluate EL models.
Oh I see, from the name I mistakenly thought candidates from aida-train-kilt.jsonl
's candidates align with KILT. Thanks for the clarification.
Hello, I'm working on the Entity Disambiguation task. According to the documentation, GENRE Trie seems to be made from BLINK's Wikipedia dump which is what KILT uses. I also think GENRE is using KILT formatted data, meaning that candidates & entities in those datasets follow KILT data source entities. Is that correct? I found that sometimes candidates given by GENRE (eg "German Brazilian") do not appear in KILT data source. I wonder if there exists a mapping between GENRE outcome to KILT data source entities. This will be helpful for us to build a page that links to the actual wikipedia id (since KILT has it). Thanks!