Open zhangpengshan opened 5 years ago
Learning rate in LR is not set by dividing train count, so in Q or B propagation, learning rate should be set very small and very difficulty to tune it.
Learning rate in LR is not set by dividing train count, so in Q or B propagation, learning rate should be set very small and very difficulty to tune it.