Closed TomAugspurger closed 7 years ago
This is a good find; philosophical question: is this an issue for ADMM or for the Logistic
family? I imagine we could control the overflow from within the gradient / function calls there and leave ADMM untouched.
This is a good find; philosophical question: is this an issue for ADMM or for the Logistic family?
Oh good call. The overflow is happening in the exponential of the loglike
step.
Tried fixing this over here; the change prevents the overflow.
The default penalty for scikit-learn
's LogisticRegression
is actually l2
, so in your experiment above the two algorithms are solving different problems (our default penalty for admm
is l1
). (I imagine that happened because you switched out of our API which has the same defaults).
However, after accounting for this the two algorithms are still giving different results.
I believe this is because somewhere under the hood, scikit-learn
's fit
normalizes the columns of X, and then un-normalizes the coefficient estimates at the end. This is probably a good idea for us, too; I might add that functionality to my branch before I PR -- I'm sure there are some performance implications of doing that that I should consider first.
(edited to not require the new API)
Probably other algorithms too. I see that proximal_grad does handle it correctly. This little script compares the fit for scikit-learn LogisticRegression and our admm against a baseline of no-overflow, to a copy of the dataset with
n
values in a range that will overflow when passed through the sigmoid function.Outputs