ddbourgin / numpy-ml

Machine learning, in numpy
https://numpy-ml.readthedocs.io/
GNU General Public License v3.0
15.35k stars 3.72k forks source link

LogisticRegression regularization loss not right #26

Closed jjjjohnson closed 5 years ago

jjjjohnson commented 5 years ago

Based on my understanding of l1 and l2 loss should be: d_penalty = gamma * np.square(beta).sum() if p == "l2" else gamma * np.abs(beta).sum() instead of : d_penalty = gamma * beta if p == "l2" else gamma * np.sign(beta) https://github.com/ddbourgin/numpy-ml/blob/165ad88ea0e75c76958a3c4ddfac97c2b1c14561/linear_models/lm.py#L171

sourse : https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c Please double check! Thanks

jaymody commented 5 years ago

That seems correct, so I implemented that change and tested it and got the following results:

Screen Shot 2019-07-14 at 10 53 43 PM

The adjusted regularization doesn't seem to break anything, however the loss went up slightly on some of the examples. However, this may mean nothing since I also tested the code with no regularization at all and the results were similar to with the current regularization implementation.

I put in a PR (#28 ) to implement the change.

ddbourgin commented 5 years ago

@jjjjohnson - I don't think this is correct. The line you highlighted and which @JayMody modified in #28 is the gradient of the regularization penalty, not the penalty itself (the penalty is calculated here). Note that if we have a regularization penalty of (gamma/2) * l2norm(beta) ** 2, the gradient (ie., vector of partial derivatives of the regularization penalty wrt. each dimension of beta) is exactly gamma * beta.

Having said that, your comment did make me realize that I was sloppy lining up my gradient calc with the expression I had for the NLL -- according to the line you highlighted I assumed an un-squared regularization penalty for the 1-norm, and a squared norm penalty for the 2-norm. Oops! Will modify this momentarily.