Closed eromoe closed 4 years ago
For linear regression, the l2-regularization term is gamma * np.sqrt(beta @ beta)
The gradient of l2 penalty wrt beta is then simply gamma * beta
Keep in mind that d_penality
is the gradient of the penalty term wrt the coefficients, not the penalty itself :)
I don't use a special IDE, unfortunately. the equations are formatted for display as Sphinx reStructuredText. You can see the rendered equations in the online documentation, or build it yourself from the source in the docs
directory. There may also be IDE plugins that will try to render them, but I am not aware of any :)
@ddbourgin Thank you for reply .
From https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261
l1-regularization term is gamma * np.absolute(beta)
l2-regularization term is gamma * np.power(np.sqrt(beta @ beta), 2)
(I think you miswrote in previous comment )
The gradient of l1 penalty wrt beta is then gamma * np.sign(beta)
The gradient of l2 penalty wrt beta is then gamma * 2beta
proportional to gamma * beta
.
Actually I thought l2-regularization term
was gamma * np.sqrt(beta @ beta)
, so the gradient of l2 term is +- 1 too .Because sometimes I thought L2 norm was beta^2
, sometimes it was np.sqrt(beta^2)
in my brain , l2 norm and l2-regularization term` are so likely and mess up , now I have figure it clear .
But there is a left problem : why you multiply l1norm(beta)
in L1 case ? since the gradient of l1 penalty is gamma * np.sign(beta)
, this confused me .
Whoops, yup, that's what I get for being hasty! The regularization penalty is (gamma / 2) * np.sqrt(beta @ beta) ** 2
, which gives a gradient of gamma * beta
.
In the L1 case, I'd recommend explicitly writing down the L1 penalty (not just the l1 norm) and then trying to derive the gradient wrt beta. It should quickly become clear why there is an l1norm
term in the calc :)
@ddbourgin Sorry but I don't quite understand why penalty in L1 case need square as L2 does
penalty = 0.5 * self.gamma * np.linalg.norm(self.beta, ord=order) ** 2 # remaid square under l1 case
All ariticles I saw was using a L1 term (penalty) like And the derivative is +-\lambda . Now I am very confusing .
Oh! I see what you're saying. You're right, the square of the L1 norm is not what we want. The proper L1 penalty is
gamma * np.abs(beta).sum()
which gives a gradient of
gamma * np.sign(beta)
I'll make a PR to fix this. Thank you very much for pointing this out :)
Hello, this is a great project , I am learning how to implement model without sklearn/tensorflow , it really help me a lot .
I have a question on https://github.com/ddbourgin/numpy-ml/blob/4f37707c6c7c390645dec5a503c12a48e624b249/numpy_ml/linear_models/lm.py#L252
Since P-norm is defined as
l1norms(self.beta)
means the sum of all absulote value of each element in self.beta . I don't quite understand why the simplegamma * beta
stand for `L2 ?PS: May I ask what IDE and code document plugin you are using ? I see some annotation don't beyond to latex , it would be nice to see beautiful math symbols than raw latex :)