Open HiXiaochen opened 2 years ago
Hi,
I think this is quite dataset dependent. I think it requires more epoch than regular fine-tuning... but one thing to mention is that I also used a warmup for 100 steps, and the initialization trick, (pure prefix-tuning with random initialization doesn't work better than regular fine-tuning) and hyper-parameter tune the initialization trick is quite essential.
u
Thanks! How about the learning rate? As for only tuning prefix parameters, I think if we use an ordinary lr like 5e-5 in most finetune scenarios , it may work not well, because it isn't the same as finetuning all parameters of PLM. Further more, in low-resources scenarios, should i use a much smller lr or a larger one? Thanks for your consideration!
Hi, li I noticed that you used special words like "summarization" to initialize the pre-fixtuning in your paper, but I found that the code of prefix-tuning with BART didn't contain this part, right?....
Hi, lisa Could you please specify the initialization trick you use on the table2text task while tuning prompt with gpt2? Thanks in advance!
Very good work! What I'd like to ask is, What is the setting of hyper-parameters in the low-resource scenario of summarization, such as learning rate and numbers of epoch. I have tried to use prefixtuning to low-resources summarization tasks,but it seems to work not very well...