Open ClarenceTeee opened 4 years ago
@ClarenceTee93 I have the same question, hope @eriklindernoren could shed some light on it. I've been long following the Hands On Machine Learning with ... by Aurelien Geron .. and the equation used for Batch Gradient Descent in that book is:
2 / training_size ( X_b.T.dot( X.dot(theta) - y ) ) ; this could be re-written as 2/m ( X_b.T.dot( ypred - y ))
Even if assuming that the, X used in the equation from @eriklindernoren , already has a bias term included for each sample, and switch from ypred - y , to - (y - ypred ) makes sense, the multiplicative factor must be included in the equation, to the best of my knowledge, the math checks out , if we carefully differentiate (ypred - y ) ^ 2 w.r.t each parameter.
Yes it should be
grad_w = -(y - y_pred).dot(X) + self.regularization.grad(self.w)
in regression.py
should it be grad_w = -(y - y_pred).dot(X) * (1/training_size) + self.regularization.grad(self.w) ?