The probability from sklearn is different from thundersvm

ZesenChen commented 5 years ago

Hello, I try to use thundersvm to replace SVC in sklearn, but I found that the probability result is a little poor (but the binary classification is as good as sklearn). And I write the code as follows to have a test. The result of thundersvm and sklearn is different. Can you give me some advice.

from sklearn.svm import SVC
import thundersvm
import numpy as np

a = np.random.rand(1000,10)
b = np.zeros((1000,))
b[:300] = 1
clf1 = SVC(probability=True)
clf2 = thundersvm.SVC(probability=True)

clf1.fit(a,b)
clf2.fit(a,b)

c = np.random.rand(10,10)
print(clf1.predict_proba(c))
print(clf2.predict_proba(c))

ZesenChen commented 5 years ago

I got it that I made a silly mistake. The probability index is different from sklearn.svm.SVC. Probability index of sklearn's SVC is [0 probability, 1 probability]. Probability of thundersvm.SVC is depended on the first target of train example. I use it in multi-label learning experiment so the result is poor.

zeyiwen commented 5 years ago

Thanks for pointing out! We will consider keeping consistent to sklearn in the future upgrade.

wenlibin02 commented 4 years ago

It is really misleading to use the first training example to determine the type of the probability order.

zeyiwen commented 4 years ago

That is true. You are more than welcome to contribute and improve thundersvm.

Xtra-Computing / thundersvm

The probability from sklearn is different from thundersvm #134