What are the main contributions of p tuning?

THUDM / P-tuning-v2

An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks

Apache License 2.0

1.94k stars 196 forks source link

If it is just an implementation of existing methods, which is not novel, why the conference of p tuning paper is top CCF-A and the paper is widely cited?

So I wonder what is the core difference between p tuning and prefix tuning and deep soft prompt tuning.

From my literatur review, it seems preprending K and V is not proposed in prefix tuning, but many papers wrongly think prefix tuning is changing K V. So is it actually your inventions? to my knowledge，prefix tuning is like deep visual prompt tuning in jia's paper, which proposed to prepend the x at each layer,not KV.

I found it worth noting that your work is utilizing KV cache that hf transformefs would have as an important implementation predicate. is it also a contribution?

i have read your paper, but i am not familiar with nlp terms, so i cannot understand your contributions. in the paper,it seems your method is exactly the same with prefix tuning and p tuningv1,just changing the evaluation dataset from nlp to nlu. In your methods section, you made a table to clarify your contribution, saying that your method have Reparam. Deep PT Multitask No verb

But i got confused because it is not directly explained in the paper about what these terms are. I have the following questions:

to my best knowledge,soft prompt tuning methods are not “reparameterizable” in the terms of lora paper, but it seems your reparameterizable has a different definition,and what is that based on?
why p tuningv1 and prefix cannot multitask
what is differnt not having a verbalizer in nlp

THUDM / P-tuning-v2

What are the main contributions of p tuning? #76