Warmup multiplication not working as expected -> Fix

Tony-Y / pytorch_warmup

Learning Rate Warmup in PyTorch

https://tony-y.github.io/pytorch_warmup/

MIT License

386 stars 25 forks source link

Warmup multiplication not working as expected -> Fix #1

Closed ArneNx closed 2 years ago

ArneNx commented 4 years ago

Testing out LinearWarmup and ExponentialWarmup, I noticed the strange behavior that the learning-rate did not rise during the warmup_period. Instead, it decreased. Looking at the code, I noticed that the current, and not the initial learning-rate was always multiplied with the dampening-factor. This lead to the decrease.

See below my implementation to produce the behavior I expected to see.

Tony-Y commented 4 years ago

https://github.com/Tony-Y/pytorch_warmup/blob/e02ac84b6159b92c2bed82d147d8b96e7d2c8a2d/test/test_base.py#L29-L30

The current implementation expects use of a LR scheduler like above. Did you use any LR scheduler?

ArneNx commented 4 years ago

I'm using MultiStepLR. I don't see how the choice of LR scheduler would change this behavior.

Tony-Y commented 4 years ago

lr_scheduler.step(lr_scheduler.last_epoch+1)

Could you try this if you use pytorch 1.4 or higher?

ArneNx commented 4 years ago

I tried this earlier. It didn't change anything.

Tony-Y commented 4 years ago

What version of pytorch did you use?

ArneNx commented 4 years ago

It's 1.4

Tony-Y commented 4 years ago

There are two approaches explained in the README. Which approach did you employ?

zhixinma commented 3 years ago

https://github.com/Tony-Y/pytorch_warmup/blob/e99f2e5c872c97e35f587f8117706e44c00e9d3c/pytorch_warmup/base.py#L50-L53

https://github.com/Tony-Y/pytorch_warmup/blob/e99f2e5c872c97e35f587f8117706e44c00e9d3c/pytorch_warmup/base.py#L92-L93

The returned warmup_factor omega is always less than 1 and the operation group['lr'] *= omega makes it decrease step by step.

Tony-Y commented 3 years ago

Keep in mind that group['lr'] must be reset by an LR scheduler in the closed (non-chainable) form before the multiplication of omega.

Explanation of chainable schedulers: https://github.com/pytorch/pytorch/pull/26423

zhixinma commented 3 years ago

Thx, that may be the point.

Tony-Y commented 2 years ago

The new release v0.1.0 works with PyTorch 1.4.0 or above.