Infinite loop or never returns for logistic regression in nearly degenerate case using scikit learn

MarvinT commented 8 years ago

Description

When using scikit learn, Logistic Regression never returns on fitting with nearly degenerate data. Scikit learn passed the blame on to liblinear.

Steps/Code to Reproduce

import sklearn.linear_model
import numpy as np
model = sklearn.linear_model.LogisticRegression()
num_pts = 15
x = np.zeros((num_pts*2, 2))
x[3] = 3.7491010398553741e-208
y = np.append(np.zeros(num_pts), np.ones(num_pts))
model.fit(x, y)

Expected Results

Return or throw error.

Actual Results

Never returns.

Versions

Linux-2.6.32-573.18.1.el6.x86_64-x86_64-with-redhat-6.7-Carbon ('Python', '2.7.12 |Anaconda 2.0.1 (64-bit)| (default, Jul 2 2016, 17:42:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]') ('NumPy', '1.11.0') ('SciPy', '0.17.0') ('Scikit-Learn', '0.17.1')

amueller commented 8 years ago

can you try to reproduce it with the command line interface? Otherwise it might be numerical issues caused by us (sklearn). Also, how about scaling your data ;)

infwinston commented 8 years ago

Thanks for reporting this issue. we looked into it and found the issue is coming from the too small gradient norm in the beginning, which leads to a infinite loop in conjugate gradient subroutine this issue can be fixed by setting a maximum number of CG iterations. we are going to fix it in next release. thanks

MarvinT commented 8 years ago

Thanks, that's awesome.

Sorry for not providing a more precise source of the error.

simsong commented 6 years ago

This issue was moved to angleto/liblinear#10

cjlin1 / liblinear