Closed titu1994 closed 5 years ago
Thanks~ I've updated the codes.
Hmm your implementation is kinda off, the base_lr is always set to the initial lr (1e-3 in this case).
Edit: What I meant to say is, base lr shouldn't be a parameter because it should always default to the initial lr.
The scaling is done to just give a linear multiplication of (1, 0.1, 0.001, ...) etc as the learning rate drops by the same values.
The argument is used for loading.
I just wanted to notify that without the base LR scaling done at this line https://github.com/Luolc/AdaBound/blob/master/adabound/adabound.py#L110, your optimizer will not properly lower the bounds when LR decay is applied, either via LearningRateSchedule or ReduceLROnPlateau callbacks.