Should've mentioned about "CRITICAL" modifications done in transformers source code

Thanks for public opening of your work. I really appreciate your simple yet param-effective method for tuning PLMs.

In fact, I've gone through hard time re-implementing the original experiment of yours. Until knowing that you've modified modeling_gpt2.py / GPT2LMHeadModel.prepare_inputs_for_generation() (and maybe lil' modifications in generation_utils.py) results were truly mysterious.

The function mentioned above is necessary for making this method actually work. It preserves past_key_values passed. Otherwise, PLM will not incorporate the learned prefix embedding during the generation.

It was really painful process to track this down. You hinted about modifications of data_collators but not about generation part of the transformers which is critical part of the implementation. Meh😕.

Hope this helps the other visitors.

XiangLi1999 / PrefixTuning

Should've mentioned about "CRITICAL" modifications done in transformers source code #37