Open lelezanardo opened 2 years ago
Hi @lelezanardo ,
Thanks for your feedback.
Remember that theta_ridge
and gradients
are both 2D arrays of shape [2, 1]. In other words, they're both column vectors. So when you add alpha * theta_ridge[1]
, you are actually adding a 1D array of shape [1] to a vector: it contains a single value. Doing this will add the single value to both elements of the gradients vector, which is not what you want. So instead, you should add: alpha * theta_ridge * [[0.], [1.]]
. This is equivalent to adding the vector [[0.], [alpha * theta_ridge[1, 0]]]
.
Moreover, Scikit-Learn's Ridge
class actually minimizes the Sum of Squared Errors (SSE), not the Mean Squared Error, and they also add alpha
||w||2 to the loss, instead of 1/2 alpha
||w||2. Therefore, to get the same result as the Ridge
class, you need to scale alpha
by a factor of 2 / m
.
In short, here's the correct code:
eta = 0.1
for iteration in range(n_iterations):
gradients = 2 / m * X_b.T.dot(X_b.dot(theta_ridge) - y) # linear regression
gradients += 2 * alpha / m * theta_ridge * [[0.], [1.]] # add l2 penalty
theta = theta - eta * gradients
Alternatively, you could minimize the SSE like they do, but then you would have to divide the learning rate by m
:
eta = 0.1 / m
for iteration in range(n_iterations):
gradients = 2 * X_b.T.dot(X_b.dot(theta_ridge) - y) # linear regression
gradients += 2 * alpha * theta_ridge * [[0.], [1.]] # add l2 penalty
theta = theta - eta * gradients
Here's a gist notebook with the first solution.
I'll update the book to make that clearer. Thanks again!
Ridge regression using Gradient Descent Hi! I was trying to implement a Ridge regression in gradient descent by adding alpha*theta to the MSE gradient vector (where theta is the parameter vector).
So I've used the following code:
Expected behavior I would expect the final parameters vector to be the same as the one that I find by using the Ridge module from sklearn.linear_model, but it is not:
I've also tried different solvers (even Cholesky as reported in the book) but I always get different results.