hyper-parameters setting in low-data scenario

XiangLi1999 / PrefixTuning

Prefix-Tuning: Optimizing Continuous Prompts for Generation

868 stars 158 forks source link

hyper-parameters setting in low-data scenario #18

Open HiXiaochen opened 2 years ago

HiXiaochen commented 2 years ago

Very good work! What I'd like to ask is, What is the setting of hyper-parameters in the low-resource scenario of summarization, such as learning rate and numbers of epoch. I have tried to use prefixtuning to low-resources summarization tasks,but it seems to work not very well...

XiangLi1999 commented 2 years ago

Hi,

I think this is quite dataset dependent. I think it requires more epoch than regular fine-tuning... but one thing to mention is that I also used a warmup for 100 steps, and the initialization trick, (pure prefix-tuning with random initialization doesn't work better than regular fine-tuning) and hyper-parameter tune the initialization trick is quite essential.

HiXiaochen commented 2 years ago

u

Thanks! How about the learning rate? As for only tuning prefix parameters, I think if we use an ordinary lr like 5e-5 in most finetune scenarios , it may work not well, because it isn't the same as finetuning all parameters of PLM. Further more, in low-resources scenarios, should i use a much smller lr or a larger one? Thanks for your consideration!

probe2 commented 2 years ago

Hi, li I noticed that you used special words like "summarization" to initialize the pre-fixtuning in your paper, but I found that the code of prefix-tuning with BART didn't contain this part, right?....

insomnia1996 commented 2 years ago

Hi, lisa Could you please specify the initialization trick you use on the table2text task while tuning prompt with gpt2? Thanks in advance!