ageron / handson-ml3

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
7.96k stars 3.2k forks source link

[QUESTION] Chapter 4 Exercise Question 12 - cost function with l2 regularization seems incorrect #118

Open wowthecoder opened 11 months ago

wowthecoder commented 11 months ago

When attempting the question, there is a bonus part to add l2 regularization to the softmax regression code (In [75]): image

According to the book, in the section about Ridge Regression, we are supposed to add ($\dfrac{\alpha}{m}$ * sum of thetas) to the original cost function. However in line 2 in the picture above, l2_loss is somehow calculated with 1/2 multiplied at the front. Shouldn't it be 1/m instead?

image According to the same section of the book, we should add $2\alpha w / m$ to the MSE gradient vector. So in line 3 of the picture above, shouldn't it be 2 * alpha * Theta[1:] / m instead?

Maybe this is why the validation loss suddenly increased a lot when the regularization is applied.

If this is indeed a typo, the bottom sections involving the hyperparameter C also has to be changed.

wowthecoder commented 11 months ago

Can someone clarify on this?

tooniesnguyen commented 9 months ago

I'm also curious about the same. But I think the more correct sklearn formula in the book is wrong