danijar / layered

Clean implementation of feed forward neural networks
MIT License
240 stars 33 forks source link

Average or just add up gradients in batch backpropagation? #1

Closed danijar closed 8 years ago

danijar commented 8 years ago

Currently for large batch sizes a large learning rate is required. Maybe that's because I average gradients. Find out if batches are summed or averaged usually.

danijar commented 8 years ago

Randall & Martinez, 2013 imply that both are common. While the sum is mathematically more correct, the average is more practical since it divides the learning rate by the batch size and makes batch gradient decent more stable.

Just keep the implementation as it is and add a docstring describing what's used.