Closed HungerPWAY closed 8 months ago
Thanks for your question. We perform the hyperparameter for each task and select the best-performing model. We do not adjust "weight decay," "batch size," and "warm-up steps." Our main focus is on the number of training steps (with a general rule that more is better) and on choosing appropriate learning rates.
I have the same question with HungerPWAY, since in the paper the "weight decay," "batch size," and "warm-up steps" are "0.01", "16" and "0.06 of the total steps", respectively, while the number are "1e-5", "32" and "500" in the provided script. Hope to know which one is the correct one that have been used in your experiments. Thank you very much!
Thank you so much for pointing this out. Please use the number in the GitHub repository. I will double-check and correct this issue!
Thank your reply!
Could you please tell me which one is the correct hyperparameter, those in the paper or those provided in the readme, such as "weight decay"、“batch size" and "warm up steps"?