Closed I159 closed 6 years ago
Stochastic gradient is overlearning nn very fast. First several tens of data items perform real descent but then loss is kept between 0.33 - 0.67 which leads to a very weak recognition.
Stochastic gradient is overlearning nn very fast. First several tens of data items perform real descent but then loss is kept between 0.33 - 0.67 which leads to a very weak recognition.