davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.52k stars 3.37k forks source link

Strange behavior of the loss depending on the learning rate #610

Closed konstantin-doncov closed 7 years ago

konstantin-doncov commented 7 years ago

I'm trying to train my own neural net using dlib, so I defined my loss class and now I'm trying to test it. But I get some strange behavior: at first, I need to set initial learning rate to 1e-8 instead of 1e-1(suggested by you), because if I set some bigger learning rate, then I will get nan loss after a few iterations, is this normal? Next, sometimes when learning rate changing e.g. from 1e-10 to 1e-11, then my loss increases sharply, and after that does not decrease to the previous level of learning rate(also, it can even increase), what does it mean and how can I fix this? Regards!

davisking commented 7 years ago

I don't know. Maybe your network definition is weird and hard to optimize, maybe your data is funny. Getting DNNs to optimize can be hard. You should refer to the academic literature to get some ideas about what works and what doesn't.

konstantin-doncov commented 7 years ago

@davisking ok, but I want to note that loss decreases before learning rate decreasing. But after learning rate decreasing to the some step(e.g. from 1e-10 to 1e-11), then my loss does not decrease. So, I think it can be due to insufficient accuracy of the some C++ types(e.g. double): gradient multiplied by too small step size(learning rate), leads to the very small, inaccurate and distorted numbers that cannot do a gradient descent in the correct direction, so the loss decreasing is also stopped. What do you think about this and what should I do if this is true?