Closed konstantin-doncov closed 7 years ago
I don't know. Maybe your network definition is weird and hard to optimize, maybe your data is funny. Getting DNNs to optimize can be hard. You should refer to the academic literature to get some ideas about what works and what doesn't.
@davisking ok, but I want to note that loss decreases before learning rate decreasing. But after learning rate decreasing to the some step(e.g. from 1e-10
to 1e-11
), then my loss does not decrease.
So, I think it can be due to insufficient accuracy of the some C++ types(e.g. double
): gradient multiplied by too small step size(learning rate), leads to the very small, inaccurate and distorted numbers that cannot do a gradient descent in the correct direction, so the loss decreasing is also stopped.
What do you think about this and what should I do if this is true?
I'm trying to train my own neural net using dlib, so I defined my loss class and now I'm trying to test it. But I get some strange behavior: at first, I need to set initial learning rate to
1e-8
instead of1e-1
(suggested by you), because if I set some bigger learning rate, then I will getnan
loss after a few iterations, is this normal? Next, sometimes when learning rate changing e.g. from1e-10
to1e-11
, then my loss increases sharply, and after that does not decrease to the previous level of learning rate(also, it can even increase), what does it mean and how can I fix this? Regards!