facebookresearch / GENRE

Autoregressive Entity Retrieval
Other
763 stars 102 forks source link

Leveraging Trie in E2E Entity Linking #59

Closed mklimasz closed 3 years ago

mklimasz commented 3 years ago

Hi! First, awesome work on both GENRE and mGENRE :)

I have a trouble using KILT Wikipedia trie in E2E Entity Linking setting. As far as I understand it queries trie here. However, Kilt trie has root node equal to [2], which isn't there. I tried adding it there, but it isn't sufficient. Can you take a look?

trie.trie_dict.keys()
# dict_keys([2])

Full code below:

import pickle
from genre.trie import Trie
from genre.hf_model import GENRE
from genre.entity_linking import get_end_to_end_prefix_allowed_tokens_fn_hf as get_prefix_allowed_tokens_fn

with open("./data/kilt_titles_trie_dict.pkl", "rb") as f:
    trie = Trie.load_from_dict(pickle.load(f))

model = GENRE.from_pretrained("./models/hf_e2e_entity_linking_wiki_abs").eval()
sentences = ["In 1921, Einstein received a Nobel Prize."]

# Without trie
# Generates nice results
prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(model, sentences)
model.sample(
    sentences,
    prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
)

# [[{'text': 'In { 1921 } [ List of Nobel laureates in Physiology or Medicine by year of appointment ], { Einstein } [ Albert Einstein ] received a { Nobel } [ Nobel Prize in Physics ] Prize.',  'logprob': tensor(-0.9672)}], ...

# With trie
# Generates weird results
prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(model, sentences, candidates_trie=trie)
model.sample(
    sentences,
    prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
)

# [[{'text': 'In { 1921, Einstein received a Nobel Prize. } and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and ...
nicola-decao commented 3 years ago

Hi!

You need to generate a different trie. As I showed in https://github.com/facebookresearch/GENRE/tree/main/examples_genre the trie for EL is different from ED. The EL trie does not have root [2] but it has to start with } (see below).

prefix_allowed_tokens_fn = get_prefix_allowed_tokens_fn(
    model,
    sentences,
    candidates_trie=Trie([
        model.encode(" }} [ {} ]".format(e))[1:].tolist()
        for e in ["Albert Einstein", "Nobel Prize in Physics", "NIL"]
    ])
)

Thus if you want to use the KILT BPE trie you need to generate another one like

kilt_titles = # [<list of all KILT titles>]
new_kilt_trie = Trie([
    model.encode(" }} [ {} ]".format(e))[1:].tolist()
    for e in kilt_titles + ["NIL"]
])