THUDM / P-tuning

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
MIT License
924 stars 111 forks source link

Prompt Length in SuperGLUE #6

Closed Shamdan17 closed 3 years ago

Shamdan17 commented 3 years ago

Hello,

Thank you for your work and for releasing the code! I just have a few questions regarding the size of the prompt embeddings:

Thank you for your work and cooperation!

zheng-yanan commented 3 years ago

Hello,

Thank you for your work and for releasing the code! I just have a few questions regarding the size of the prompt embeddings:

  • From the scripts you shared for the superglue tasks, the pattern id chosen is 1 for most tasks (Except wsc with 2). If I understood correctly, you discard the original notion of patterns and use the pattern id to denote the number of prompt embeddings you are going to train. Does this mean you are using a single prompt embedding vector in most tasks?
  • If so, is there a specific reason why LSTM performs better than MLP in this case? If I understood correctly, one of the reasons why LSTM was used is to help with the association problem and to make the different prompt embeddings dependent. Would this problem exist given just 1 prompt embedding?

Thank you for your work and cooperation!

  1. Yes, the pattern_id is used to denote the number of prompt embeddings. For few-shot tasks, most tasks adapt single prompt embedding.
  2. According to our experiment results, in the few-shot setting with single prompt embedding, both LSTM and MLP yields similar results with subtle differences. The association problem is more obvious in knowledge probing with 6-9 prompt embeddings.

Thank you.