facebookresearch / GENRE

Autoregressive Entity Retrieval
Other
763 stars 102 forks source link

'NIL' not in kilt_titles_trie_dict.pkl #57

Closed HuiBinR closed 3 years ago

HuiBinR commented 3 years ago

Goal: I am using the model to test some out of domain Entity Disambiguation data. Problem: I look for NIL in the test result file, but i found no NIL in the file. So doesn't kilt_titles_trie_dict.pkl contain the entity NIL?

What I do: I load the kilt_titles_trie_dict.pkl as dict, and try to use the id of the NIL tokenize result. But I can't find the ids.

import pickle
from genre.trie import Trie

with open("datasets/kilt_titles_trie_dict.pkl", "rb") as f:
    a = pickle.load(f)
    trie = Trie.load_from_dict(a)

g = tokenizer.tokenize('NIL')
print(g)
print(tokenizer('NIL'))
## result is : ['N', 'IL']
## {'input_ids': [0, 487, 3063, 2], 'attention_mask': [1, 1, 1, 1]}

print(a[2][487][3063][2])   # print lead to KeyError: 2
nicola-decao commented 3 years ago

"NIL" is not in kilt_titles_trie_dict.pkl since is not a title in KILT