Closed Doragd closed 2 years ago
Hi @Doragd,
Thanks for your interest. Actually, we follow the Roberta training that the warmup step is about 6% of the total step. Therefore, the number seems weird, but they are not randomly or manually designed.
Thanks
Hi, bro. Thanks for your insightful work. I would like to know the following details. In: https://github.com/dropreg/R-Drop/blob/main/huggingface_transformer_src/README.md The hyperparameter of warmup steps is so weird. How to choose it and how does it affect the performance?