Is that really SGD? - Githubissues

Hi @romilly, thanks for the question. The SGD class that I implemented is mainly concerned with the update rule, implemented in the step function:

https://github.com/dfdazac/machine-learning-1/blob/0beb7c098aa8b16689075822f76bc1b4fd38dedf/neural_networks/optimizers.py#L11-L14

The SGD class is intended to encapsulate this, which differs from rules of other optimizers that, for example, add momentum, or per-parameter learning rates.

This is also a very specific optimizer because I explicitly designed to optimize the weights in a linear layer, using the gradients in layer.dW and layer.db. Even then, the class can be used independent of the number of samples, because the gradients dW and db always have the same shape of the parameters, and they are computed in a separate class (NNClassifier where they can be computed with one sample, a mini-batch of samples, or even the full training set.

dfdazac / machine-learning-1

Is that really SGD? #1