Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.55k stars 215 forks source link

SVC doesn't work with sklearn OneVsRestClassifier #229

Open hblab-anhnt opened 3 years ago

hblab-anhnt commented 3 years ago

I need to process a big training data with OneVsRestClassifier(SVC) model. Due to training data size, i need GPU support, so i moved from sklearn to thundersvm. But after replacing, its result become worse. How can i fix it? Please check below code for reproduction bugs:

# get thundersvm test data
!git clone https://github.com/Xtra-Computing/thundersvm.git
from thundersvm import SVC
from sklearn.datasets import *
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC as oldSVC

#loading data from thunder svm test data
x,y = load_svmlight_file("../dataset/test_dataset.txt")
x2,y2=load_svmlight_file("../dataset/test_dataset.txt")

# sklearn OneVsRestClassifier(SVM) -> score 0.98. It works well
clf = OneVsRestClassifier(oldSVC(verbose=True, gamma=0.5, C=100))
clf.fit(x,y)
y_predict=clf.predict(x2)
score=clf.score(x2,y2)
print(score) 

# thundersvm OneVsRestClassifier(SVM) -> score 0.02. It becomes worse
clf = OneVsRestClassifier(SVC(verbose=True, gamma=0.5, C=100))
clf.fit(x,y)
y_predict=clf.predict(x2)
score=clf.score(x2,y2)
print(score) 

Please help me to fix it. Our training data is huge, so without GPU supporting, it is infeasible for creating model

hblab-anhnt commented 3 years ago

can anyone help me ?

Kurt-Liuhf commented 3 years ago

Hi @hblab-anhnt, can you provide some data that helps us reproduce your results?

zeyiwen commented 3 years ago

@hblab-anhnt ThunderSVM only supports one-vs-one for classification which often produces competitive results to one-vs-rest. Would you try one-vs-one? I will mark this issue as enhancement, so that we can work on it in the future upgrade.

hblab-anhnt commented 3 years ago

@zeyiwen So I should use OneVsRestClassifier(SVM(decision_function_shape='ovo')) or OneVsOneClassifier(SVC()) ? However, i prefer one-vs-rest than one-vs-one due to complexity and prediction speed .With n classes for multi classification, one-vs-rest create n models, but one-vs-one creates n(n-1)/2 models, which means increasing complexity and training/prediction time

hblab-anhnt commented 3 years ago

@zeyiwen @Kurt-Liuhf Sorry i forget adding loading data code. I use test data from thunder svm

!git clone https://github.com/Xtra-Computing/thundersvm.git
from sklearn.datasets import *
x,y = load_svmlight_file("../dataset/test_dataset.txt")
x2,y2=load_svmlight_file("../dataset/test_dataset.txt")