Closed shaibagon closed 6 years ago
Hi @shaibagon, a negative learning rate can (and does) occur during the late stages of training where the learning rate decays to a value around zero and fluctuates around it. Because the learning rate is fluctuating around zero, it takes both positive and negative values that are small in absolute value.
This is a behavior we've observed in many problems and it does not seem to cause a destructive behavior. We don't yet have a theoretical analysis of this, but I suspect that the learning rate converges to zero in expectation.
The conference poster also shares this empirical finding: https://github.com/gbaydin/hypergradient-descent/raw/master/poster/iclr_2018_poster.pdf
@gbaydin Thank you very much for clarifying this matter. Closing this as "non-issue".
Very interesting work and thanks for sharing your implementation.
I was trying
SGDHD
optimizer on a learning task, and while inspecting the actual learning rate (the values ofoptimizer['param_groups'][0]['lr']
) I noticed that sometimes the actual learning rate becomes negative (e.g.,-9.65e-06
). Although the value is very small, it is still very weird to see a negative value for the learning rate. Shouldn't the code cap the value at zero? Is this a normal behavior?BTW, I started with
lr: 0
andhypergrad_lr: 1e-8
.Thanks! -Shai