Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.57k stars 218 forks source link

GridSearchCV - n_jobs #239

Open itsciccio opened 3 years ago

itsciccio commented 3 years ago

Hi,

I am tuning hyperparameters using Scikit's GridSearchCV as such:

grid = GridSearchCV(SVC(), param_grid, refit = True, verbose=3, cv=skf.split(x,y), n_jobs=-1, scoring = 'accuracy')

and I get this error...

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

Note that this works well using the normal SVC from Scikit. The only difference is I wish to use thundersvm's SVC due to its numerous benefits. I am sure this is an issue with n_jobs parameter.

I have also tried this:

grid = GridSearchCV(SVC(n_jobs=-1), param_grid, refit = True, verbose=3, cv=skf.split(x,y), scoring = 'accuracy')

but it does not seem to speed things up. I have 6 cores, running CUDAv11 on an RTX 2060 Super. Any help would be appreciated!

QinbinLi commented 3 years ago

Hi @itsciccio ,

ThunderSVM currently does not support setting n_jobs=-1 in scikit CV. You can set n_jobs inside SVC(), which defines the number of threads used by CPUs and is -1 (i.e., maximum number of threads) by default. Since the computing usually happends in GPUs, increasing the threads used by CPUs may not show significant speedup.