Open lorentzenchr opened 1 year ago
If n_features > n_samples
Then it's likely that the data is perfectly separable and so the minimizer does not even exist, no? all algorithms go to infinity (at a log log speed)
@tomMoral what's our policy between 3 separate repos (L1, L2, no pen) or a single one with some solvers skipping configs?
If n_features > n_samples Then it's likely that the data is perfectly separable and so the minimizer does not even exist, no? all algorithms go to infinity (at a log log speed)
Not quite, the solution is just not unique. Then, a desirable property of the coefficients is to have minimum L2 norm among all solutions. But that is hardly ever promised by any solver.
w
which perfectly classifies the training data, then t * w
with scalar t
going to infinity make the Logistic loss go to 0, because - t y_i * x_i^\top w
goes to - infty for all training samples.log(1 + exp(u)) > 0
, a loss value of exactly 0 is unreachable. hence there is no minimizer in that case (see discussion above equation 2 in https://arxiv.org/pdf/1710.10345.pdf).
Edit: it's still a very interesting problem
Thanks for the linked literature.
Since log(1 + exp(u)) > 0, a loss value of exactly 0 is unreachable.
A good implementation will reach a loss very close to zero, e.g. I got about n_samples * 1e-10 for the 20 news dataset.
Yes, the objective converges to 0, but the iterates are diverging (there is no minimizer, ie the argmin is empty and the min is an inf)
I'm transferring the issue to benchmark_logreg_l2 as it seems to be less directly linked to benchopt.
PR welcome :)
Message ID: @.***>
The problem sets of logistic regression all have a penalty. It would be very interesting, at least to me, to add the zero penalty case.
Note: For n_features > n_samples, like the 20 news dataset, this is real fun (from an optimization point of view).