Use --init_shallow_word for seq2seq model - Githubissues

XiangLi1999 / PrefixTuning

Prefix-Tuning: Optimizing Continuous Prompts for Generation

868 stars 158 forks source link

Use --init_shallow_word for seq2seq model #32

Open JaniceXiong opened 2 years ago

JaniceXiong commented 2 years ago

Hi, thanks for your wonderful work! But I have some questions about your open code. I saw that "--init_shallow_word" is used in gpt2 model(GPT2LMHeadModel), so the prev_key and prev_value can be initialized by some provided word like "summarize". https://github.com/XiangLi1999/PrefixTuning/blob/0eb23e401bfb7f00aceb48af5cc77573fed90e29/gpt2/train_e2e.py#L34-L35

If I want to use this trick in seq2seq model(BartForConditionalGeneration), where and how to change your code? I have found that directly using "get_gold_init" function didn't work. https://github.com/XiangLi1999/PrefixTuning/blob/6519d30e69b15a180f23e2cd41b766d3f62b8e82/gpt2/train_control.py#L184-L191

It seems that BartModel forward function didn't return "past_key_values" because "use_cache" is set to False or the return format is different from GPT2LMHeadModel forward function. I didn't figure out this problem, and any reply would be helpful :) @XiangLi1999 https://github.com/XiangLi1999/PrefixTuning/blob/6519d30e69b15a180f23e2cd41b766d3f62b8e82/transformers/src/transformers/modeling_bart.py#L1242-L1243

Timothyxxx commented 2 years ago

https://github.com/XiangLi1999/PrefixTuning/issues/31#issuecomment-1000117367