Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.57k stars 217 forks source link

Thundersvm with caret package in R #79

Open mjmg opened 6 years ago

mjmg commented 6 years ago

Please advise if wrapping thundersvm using the caret function is feasible similar to this example:

https://stackoverflow.com/questions/29449639/svm-in-r-with-caret-using-e1071-instead-of-kernlab

If examples can be provided that would be much appreciated. This would greatly simplify the R code for cross validation.

QinbinLi commented 6 years ago

I'm sorry that I'm not familiar with the caret function. I learned the function and think it's hard to use to do cross validation for thundersvm now. The R interface of thundersvm supports reading data from file so we can't split the data in R functions. We'll improve the R interface in the future.

We have added the cross validation parameters for R interface in the latest version of ThunderSVM. You can set the number of folds for cross validation. You can have a try.

Please help us improve the R interface if possible. Thanks.

mjmg commented 6 years ago

Thanks for the clarification. Improvements in thundersvm could be any of the following:

1) Accepting native data types for the interfaces supported (python, matlab, R). this is useful if we need to benchmark only the svm related operations in thundersvm without including the overhead of reading and writing files.

2) Caret package in R is basically a wrapper to enable high level of programming for common machine learning operations (data set split, cv, training and prediction). If thundersvm can be refactored so it could be used as a custom model, then it is easier to compare thundersvm performance with other machine learning algorithms supported by caret.

3) Thundersvm in R could be refactored similar to another GPU accelerated SVM in R, RgtSVM https://github.com/Danko-Lab/Rgtsvm wherein it masks the default svm routines in the libSVM based e1071 package, making it backwards compatible and not requiring modification of existing code already written for e1071. The R implementation of e1071 also has a high level function called tune/best.svm which provides an easier interface and abstracts grid search operations for getting the best model. It is interesting to note that RgtSVM is not available for Windows platforms and it would be nice to compare performance of GT SVM versus thundersvm implementation.

QinbinLi commented 6 years ago

Thanks for your feedback! Your suggestions are very useful for the improvements on thundersvm in the future.

asheetal commented 3 years ago

I was actually able to integrate thundersvm in the caret class using reticulate python bindings. Not very elegant, but gives me a model that utilizes caret cross validation and the GPU speedup seems worth the python overhead. Also the model cannot be saved without somehow making the python resident in RAM

rvmGPUFit <- function(x, y, wts, param, lev, last, weights, classProbs, verbose = TRUE, ...) {
  svr <- reticulate::import("thundersvm")
  reg <- svr$SVR(gamma = param$gamma,
                 C = param$C,
                 verbose = verbose)
  reg$fit(x, y)
}