Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.
In fact, I've gone through hard time re-implementing the original experiment of yours.
Until knowing that you've modified modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation() (and maybe lil' modifications in generation_utils.py) results were truly mysterious.
The function mentioned above is necessary for making this method actually work. It preserves past_key_values passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.
It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. Meh😕.
Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.
In fact, I've gone through hard time re-implementing the original experiment of yours. Until knowing that you've modified
modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation()
(and maybe lil' modifications ingeneration_utils.py
) results were truly mysterious.The function mentioned above is necessary for making this method actually work. It preserves
past_key_values
passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. Meh😕.
Hope this helps the other visitors.