cjlin1 / libsvm

LIBSVM -- A Library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
BSD 3-Clause "New" or "Revised" License
4.52k stars 1.64k forks source link

Use libsvm in hadoop #77

Open rendi7936 opened 7 years ago

rendi7936 commented 7 years ago

Hello, everyone. I want to ask something. Can i use libsvm in apache hadoop ? Is it work with map reduce programming model in hadoop ?

infwinston commented 7 years ago

why do you want to use libsvm on hadoop? I think it might be improper to apply kernel svm onto map reduce settings cause currently the kernel svm solver can not handle too large data sets

rendi7936 commented 7 years ago

I want to do performance analysis in Hadoop and Spark using SVM algorithm.

If i only use less than 1 GB dataset, it is ok ? I have read many paper that Hadoop can implement SVM algorithm, but no one explain what library they use. So, i start with libSVM.

So, What should i do ? Or are there another SVM library that support map reduce programing model ?

GerbenKD commented 7 years ago

Spark ML contains an implementation of Linear SVMs, similar to, but not as comprehensive as those in LibLINEAR. As @infwinston mentioned, SVMs with kernels, which is what LibSVM is for, are not really suited for Hadoop and Spark, since they don't scale well to large datasets, which is why you would use Hadoop/Spark. If your dataset is not large, then just use LibSVM directly.

infwinston commented 7 years ago

I think you may want to check out LIBLINEAR webpage and github page. in some cases, Linear SVMs give good enough performance and get way faster than Kernel SVM.