dropreg / R-Drop

870 stars 107 forks source link

How the `warmup steps` affects the performance? #28

Closed Doragd closed 2 years ago

Doragd commented 2 years ago

Hi, bro. Thanks for your insightful work. I would like to know the following details. In: https://github.com/dropreg/R-Drop/blob/main/huggingface_transformer_src/README.md The hyperparameter of warmup steps is so weird. How to choose it and how does it affect the performance?

apeterswu commented 2 years ago

Hi @Doragd,

Thanks for your interest. Actually, we follow the Roberta training that the warmup step is about 6% of the total step. Therefore, the number seems weird, but they are not randomly or manually designed.

Doragd commented 2 years ago

Thanks