Closed ArneNx closed 2 years ago
The current implementation expects use of a LR scheduler like above. Did you use any LR scheduler?
I'm using MultiStepLR
. I don't see how the choice of LR scheduler would change this behavior.
lr_scheduler.step(lr_scheduler.last_epoch+1)
Could you try this if you use pytorch 1.4 or higher?
I tried this earlier. It didn't change anything.
What version of pytorch did you use?
It's 1.4
There are two approaches explained in the README. Which approach did you employ?
The returned warmup_factor
omega is always less than 1 and the operation group['lr'] *= omega
makes it decrease step by step.
Keep in mind that group['lr']
must be reset by an LR scheduler in the closed (non-chainable) form before the multiplication of omega
.
Explanation of chainable schedulers: https://github.com/pytorch/pytorch/pull/26423
Thx, that may be the point.
Testing out
LinearWarmup
andExponentialWarmup
, I noticed the strange behavior that the learning-rate did not rise during thewarmup_period
. Instead, it decreased. Looking at the code, I noticed that the current, and not the initial learning-rate was always multiplied with the dampening-factor. This lead to the decrease.See below my implementation to produce the behavior I expected to see.