Babelscape / rebel

REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).
505 stars 73 forks source link

is it possible to specify which word i want to generate relations for? #62

Closed averieso closed 1 year ago

averieso commented 1 year ago

for example, given this paragraph, can i just get relations of Ike Turner? asking this especially because the generated results are not exhaustive for all relations.

Tina Turner (born Anna Mae Bullock; November 26, 1939 – May 24, 2023) was an American-born singer. Known as the "Queen of Rock 'n' Roll, she rose to prominence as the lead singer of the Ike & Tina Turner Revue before launching a successful career as a solo performer. She was noted for her "swagger, sensuality, powerful gravelly vocals and unstoppable energy", along with her well publicized history with ex-husband Ike Turner and her famous legs.

LittlePea13 commented 1 year ago

Hi @averieso

That's an interesting question. Since REBEL is a seq2seq model, the only way to condition it to generate that is to force the decoder to predict Ike Turner. To do so you need to give the model the "decoder_input_ids" with what you want..

So if we input your text to REBEL we would obtain: <s><triplet> Tina Turner <subj> November 26, 1939 <obj> date of birth <subj> May 24, 2023 <obj> date of death <subj> Ike Turner <obj> spouse <triplet> Ike Turner <subj> Tina Turner <obj> spouse <subj> Tina Turner <obj> spouse <triplet> Tina Turner <subj> November 26, 1939 <obj> date of birth <subj> May 24, 2023 <obj> date of death <subj> Ike Turner <obj> spouse</s> It contains some repeated triplets due to Tina Turner appearing twice in the text, but they are correct. Notice how it did predict one relation for Ike Turner, {'head': 'Ike Turner', 'type': 'spouse', 'tail': 'Tina Turner'}

Since you are interested on triplets about Ike Turner, you can force the model to start generation with Ike Turner. That way it will start from there. If we do so we obtain: <s><triplet> Ike Turner <subj> Tina Turner <obj> spouse <subj> Tina Turner <obj> spouse</s>

So now it just produced the pred for Ike, but it is a bit underwhelming as it is the same as we had before within the other preds. Want more? Well we can play with length penalty to force the model to generate more. If we set the length penalty to 2:

<s><triplet> Ike Turner <subj> Tina Turner <obj> spouse <subj> Tina Turner <obj> spouse <triplet> Tina Turner <subj> Ike Turner <obj> spouse <subj> Ike Turner <obj> spouse <triplet> Ike Turner <subj> Tina Turner <obj> spouse <subj> Tina Turner <obj> spouse</s>

The model did predict more but not what we wanted, as it moved to another entity by generating the token . But we can tell the model not to do that using bad_words_ids! There are plenty of options to play around with, but they will also lead to less "probable" outputs from the model, which will be results with a lower perplexity. For instance, combining the previous things to force the model to continue generating triplets for Ike Turner leads to:

<s><triplet> Ike Turner <subj> Tina Turner <obj> spouse <subj> Tina Turner <obj> spouse <subj> famous legs <obj> notable work</s>

You can find all the code for these here: https://colab.research.google.com/drive/1InCUFNUolyooxGoHES-sKggKN0UcCad5#scrollTo=l_mexdGGHRAT