About the hyperparameters

ZhengxiangShi / DePT

[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"

http://arxiv.org/abs/2309.05173

MIT License

94 stars 15 forks source link

About the hyperparameters #6

Closed HungerPWAY closed 8 months ago

HungerPWAY commented 8 months ago

Could you please tell me which one is the correct hyperparameter, those in the paper or those provided in the readme, such as "weight decay"、“batch size" and "warm up steps"?

ZhengxiangShi commented 8 months ago

Thanks for your question. We perform the hyperparameter for each task and select the best-performing model. We do not adjust "weight decay," "batch size," and "warm-up steps." Our main focus is on the number of training steps (with a general rule that more is better) and on choosing appropriate learning rates.

yuchen3890 commented 8 months ago

I have the same question with HungerPWAY, since in the paper the "weight decay," "batch size," and "warm-up steps" are "0.01", "16" and "0.06 of the total steps", respectively, while the number are "1e-5", "32" and "500" in the provided script. Hope to know which one is the correct one that have been used in your experiments. Thank you very much!

ZhengxiangShi commented 8 months ago

Thank you so much for pointing this out. Please use the number in the GitHub repository. I will double-check and correct this issue!

HungerPWAY commented 8 months ago

Thank your reply!