facebookresearch / GENRE

Autoregressive Entity Retrieval
Other
756 stars 98 forks source link

Differences in entity disambiguation results on WikilinksNED Unseen-Mentions dataset #58

Closed dineshkh closed 3 years ago

dineshkh commented 3 years ago

Hi Nicola,

  1. I have run the GENRE on WikilinksNED Unseen-Mentions dataset (taken from here "https://github.com/yasumasaonoe/ET4EL") and got an accuracy of 61.3, but the paper reports 63.52 (Table 6). I used the default settings: beam=10,max_len_a=384, max_len_b=15, and KILT trie. Can you tell me the reason for this difference? Is it because of not using "Yago trie"? Can you tell me while running your experiment on WikilinksNED Unseen-Mentions dataset did you lowercase all the sentences ?

  2. Does GENRE output for entity disambiguation depends whether the context of the mention is written in lowercase or uppercase ? Please see the below output for sentence_1 and Sentence_2. Sentence_1 = ["Einstein was a [START_ENT] German [END_ENT] physicist."]

[{'text': 'Germans', 'score': tensor(-0.2991, device='cuda:0')}

Sentence_2 = ["einstein was a [START_ENT] german [END_ENT] physicist.”]

[{'text': 'Germany', 'score': tensor(-0.2907, device='cuda:0')},

nicola-decao commented 3 years ago

Hi!

  1. I cannot remember if I used the YAGO trie, I’m sorry. Also, I used an internal version that was already preprocessed so I am also not sure whether there was also some data cleaning applied by someone else. For sure I did not lowercase the sentences since the model is case sensitive. I would give it a try with the YAGO trie. My guess is that the easy way is to download the data https://drive.google.com/file/d/1OSKvIiXHVVaWUhQ1-fpvePTBQfgMT6Ps from https://github.com/dalab/end2end_neural_el. There is a file called entity_universe.txt that should contain all the entities in the YAGO KB (it is used by the authors of https://arxiv.org/abs/1808.07699 to do entity linking). Another is to extract that would be downloading the data https://drive.google.com/file/d/1IDjXFnNnHf__MO5j_onw4YwR97oS8lAy taken from https://github.com/lephong/mulrel-nel. There should be a file called _p_m_e.txt and that contains the list of all mentions and entities in the YAGO KB. There are ~500k entities if I remember correctly.
  2. Yes the model is case sensitive so the model will generate differently depending on that.
dineshkh commented 3 years ago

Thanks Nicola for the reply. Can you provide your version of WikilinksNED Unseen-Mentions test file ? Also for generating Table 6 numbers you use which model (fairseq_entity_disambiguation_blink or fairseq_entity_disambiguation_aidayago ) ? also did you fine tune on WikilinksNED Unseen-Mentions train set ?

nicola-decao commented 3 years ago

Unfortunately no. I have not access to the machines I used for these experiments.

For table 6 I think I used fairseq_entity_disambiguation_blink and I fine-tuned on the WikilinksNED Unseen-Mentions train set