Tony-Y / pytorch_warmup

Learning Rate Warmup in PyTorch
https://tony-y.github.io/pytorch_warmup/
MIT License
386 stars 25 forks source link

A BUG in BaseWarmup? #26

Closed Moon0316 closed 1 week ago

Moon0316 commented 2 weeks ago

https://github.com/Tony-Y/pytorch_warmup/blob/527bb7561a2f7b6a98427656a4e014259cfc8850/pytorch_warmup/base.py#L26

I monitor the lr in training. And I surprisingly find that the stable lr after warmup is related to warmup steps. And I discover there is a dampen operation in the init function of BaseWarmup, which will permanently change the lr (to lr/warmup_steps) inside the original optimizer. This logic is strange and confusing, I wonder if it is a bug. Maybe the dampen operation in the init function should be deleted?

Moon0316 commented 2 weeks ago

I init the scheduler after the warmup, so if warmup changes the lr in warmup's init function, the scheduler will get a wrong initial lr, this will further cause a problem when I call:

with warmup.dampening():
    scheduler.step()

scheduler.step() will cause the self.lrs in Warmup object to become wrong.

Tony-Y commented 2 weeks ago

The initial LR must be dampened inside the init function. Do not init the scheduler after the warmup.

Moon0316 commented 2 weeks ago

The initial LR must be dampened inside the init function. Do not init the scheduler after the warmup.

Sorry, I don't understand, the LR will be dampened as soon as the first time the dampening function is called. Why should we dampen it inside the init function?

Tony-Y commented 2 weeks ago

This way is the same as the LR scheduler that calls self.step() inside the init function: https://github.com/pytorch/pytorch/blob/db393fb95e5b057ca49472828bb6dba2db4f859e/torch/optim/lr_scheduler.py#L146-L151

In PyTorch 1.0 or earlier, we must call scheduler.step() before optimizer.step(): http://pytorch.org/docs/1.0.0/optim.html#how-to-adjust-learning-rate

Refresh your thought.

Moon0316 commented 2 weeks ago

This way is the same as the LR scheduler that calls self.step() inside the init function:

https://github.com/pytorch/pytorch/blob/db393fb95e5b057ca49472828bb6dba2db4f859e/torch/optim/lr_scheduler.py#L146-L151

In PyTorch 1.0 or earlier, we must call scheduler.step() before optimizer.step():

http://pytorch.org/docs/1.0.0/optim.html#how-to-adjust-learning-rate

Refresh your thought.

The only difference between these two methods(whether to dampen LR in init function) is whether the optimizer use the original LR or the dampened LR in step 0? In the following steps, the optimizers will adopt the same LR in two methods. Am I right?

Tony-Y commented 2 weeks ago

The LR index differs by one step. Then the LR differs during warmup.

Moon0316 commented 2 weeks ago

The LR index differs by one step. Then the LR differs during warmup.

I understand, thank you for your answer and contributions.

Maybe there could be a reminder in the documentation that scheduler should be initialized before warmup.

Tony-Y commented 1 week ago

Thank you for your suggestion. I have just updated README.