Closed TtCWH closed 1 month ago
I have solved this problem, the arg "no-load-optim" should be set if you don't want your optimizer to load the checkpoint
I think the issue should be re-considered. There should exist a way to override learning rate scheduler but still load optimizer.
Describe the bug I have completed the model training of the first stage. The settings of the first stage are lr=3e-4 and min_lr=3e-5. The settings of the second stage are lr=3e-5 and min_lr=2e-5. Moreover, I enabled the three parameters of --reset-dataloader --override-opt_param-scheduler --reset-iteration. In the output log, lr and min_lr were indeed overridden to the settings of the second stage. However, after loading the checkpoint of the first stage and starting the training, lr changed back to 3e-4.
To Reproduce
Expected behavior In step 3 above, after starting the training, the learning rate of the first step should be 3e-5.
Stack trace/logs No
Environment (please complete the following information):
Proposed fix Set the args after loading the checkpoint
Additional context