Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.56k stars 216 forks source link

get_coef has off by one error #147

Closed Palpatineli closed 5 years ago

Palpatineli commented 5 years ago

After fitting a model model.coef looks wrong. The last dimension will always be 0 and if you compare that to the coef from sklearn's SVC you will see that it's all shifted by one. For example, in one of the runs I have:

from thundersvm import SVC as tSVC
from sklearn.svm import SVC as kSVC
k_svc = kSVC(kernel='linear', gamma=1E-7, C=1)
t_svc = tSVC(kernel='linear', gamma=1E-7, C=1)
k_svc.fit(X, y)
t_svc.fit(X, y)

In[9]: ksvc.coef Out[9]: array([[ 1.32950959, 0.97228444, -2.16840056, 2.30751586, 0.12430373, 0.24256619, 1.91916157, -1.14547476, 0.92084549, -1.0261113 ]]) In[11]: ksvc.intercept Out[11]: array([0.10570043])

In[10]: tsvc.coef Out[10]: array([[ 0.97205472, -2.16802955, 2.3074162 , 0.12444011, 0.24210396, 1.91905808, -1.14513373, 0.92070591, -1.02566159, 0. ]]) In[12]: tsvc.intercept Out[12]: array([-0.10579848])

Otherwise both models predict the same results. I went into the thundernsvmScikit.py file and found that the c funciton thundersvm.get_coef already gives the off-by-1 result.

If this is intentional, how do I recover the coefficients, if it's not, can you fix that?

jiahuanluo commented 5 years ago

Hi @Palpatineli,

Could you please share a subset of your dataset and script? They will help us fix this efficiently.

Palpatineli commented 5 years ago

Hi @jiahuanluo, I have tried it on an artificial set and the same errors appear:

from thundersvm import SVC
np.random.seed(12345)
coef = np.random.randn(11)
X = np.random.randn(100, 10)
y = (X.dot(coef[0: 10, np.newaxis]) > -coef[10]).ravel() * 2 - 1
svc = SVC(kernel='linear', gamma=1E-5, C=1)
svc.fit(X, y)
print(svc.coef_)

and it prints: In[8]: print(svc.coef_) [[-0.3202174 0.19284329 0.04366636 -1.92630565 -1.4033004 -0.46049836 -0.06861141 -1.01132667 -1.38034987 0. ]]

while the same X and y with sklearn SVC gives:

kSVC(kernel='linear', gamma=1E-5, C=1).fit(X, y).coef_

array([[-0.24643811, 0.31998003, -0.19304263, -0.04345196, 1.9264185 , 1.4027516 , 0.46053458, 0.06927284, 1.011293 , 1.38041789]])

It was run no debian sid, compiled with gcc-7, on gpu (cuda 9). from commit 3e514.

QinbinLi commented 5 years ago

Hi @Palpatineli

Please update the code to the latest version. This bug existed in your version but we fixed it later. I have tried your code in the latest library and it works fine. Thanks.