Why is warmup better than RAdam?

Tony-Y / pytorch_warmup

Learning Rate Warmup in PyTorch

https://tony-y.github.io/pytorch_warmup/

MIT License

392 stars 25 forks source link

Why is warmup better than RAdam? #2

Closed brando90 closed 3 years ago

brando90 commented 3 years ago

I've argued here https://github.com/LiyuanLucasLiu/RAdam/issues/62 that if warm up and RAdam are equivalent that using RAdam might be simpler - however, I'd be curious about arguments in favour of warm up presented in this repo and related paper.

What are reasons to choose warm up isntead of RAdam?

Tony-Y commented 3 years ago

It is because the untuned linear warmup works well and is easy to implement.

brando90 commented 3 years ago

It is because the untuned linear warmup works well and is easy to implement.

But RAdam requires no tuning...doesn't that make it better than warm up?

Tony-Y commented 3 years ago

The untuned warmup dependent on beta2 requires no tuning too.