Closed goopymoon closed 7 years ago
Thanks for your feedback! This line comes from the code example in chapter 4, page 118. It shows an implementation of Stochastic Gradient Descent, which means that it only considers one training instance at a time, so the "minibatch_size" is equal to 1. If you want to implement Batch Gradient Descent, or Mini-batch Gradient Descent, then indeed you should divide by the batch size.
It seems like division by minibatch_size is missing in the "Mini-batch gradient descent" sample script. gradients = 2 xi.T.dot(xi.dot(theta) - yi) -> gradients = 2/minibatch_size xi.T.dot(xi.dot(theta) - yi)
If I run modified script with bigger n_iterations such as 500 I can get a more stable graph.