gmum / gmum.r

GMUM machine learning group R package
http://r.gmum.net
Other
33 stars 10 forks source link

Prediction method for all SVMRunners #209

Closed igorsieradzki closed 9 years ago

igorsieradzki commented 9 years ago

Like in SVMLight Call @igorsieradzki in case of any questions.

ktalik commented 9 years ago

Method cannot be implemented since there is no sufficient data in SVMConfiguration. All LibSVM types should be converted to armadillo matrices, vectors etc.

Reopening #81, since it seems that not everything is done, e.g. support vectors or at least is not commented, for example (???)

//universal parameters
ama::vec w; //d

81 is a blocker here

igorsieradzki commented 9 years ago

Before I merge the PR, could you quickly summarize the status of both svmlight and libsvm predictions methods?

Edit: I can see that we got 'our' prediction only with linear kernel.

ktalik commented 9 years ago

As @kudkudak mentioned in #304, we are discussing current issue here. Changing milestone, due to a bug discovered by @igorsieradzki. I prefer to wait for someone more R-familiar to investigate with SVM data logic in R part. Meanwhile I am writing remaining kernels.

igorsieradzki commented 9 years ago

I've pushed small commit to svm-wrapper branch, I recommend merging that to your branches, it should fix the strange integers bug.

Basically what happened: after changing data type to .RData breast_cancer data got saved as factor type, meaning that when we convert it to a matrix, we get counts of certain values, not those values themself.

No. More. Factors.

ktalik commented 9 years ago

Many many thanks for investigating! I am happy that my speculations were good. Getting your code immediately!

ktalik commented 9 years ago

Linear, Poly, RBF and Sigmoid are now available in SVMClient.

Here are results from new source('tests/testthat/benchmark.R'):

[1] "0. linear kernel:"
[1] "e1071 acc: 0.972182"
[1] "gmum libsvm acc: 0.970717"
[1] "gmum svmlight acc: 0.972182"
[1] "gmum libsvm 2e acc: 0.972182"
[1] "1. poly kernel:"
[1] "e1071 acc: 0.998536"
[1] "gmum libsvm acc: 0.970717"
[1] "gmum svmlight acc: 0.972182"
[1] "gmum libsvm 2e acc: 0.975110"
[1] "2. rbf kernel:"
[1] "e1071 acc: 0.998536"
[1] "gmum libsvm acc: 0.985359"
[1] "gmum svmlight acc: 0.970717"
[1] "gmum libsvm 2e acc: 0.961933"
[1] "3. sigmoid kernel:"
[1] "e1071 acc: 0.948755"
[1] "gmum libsvm acc: 0.967789"
[1] "gmum svmlight acc: 0.973646"
[1] "gmum libsvm 2e acc: 0.973646"

So, there are some observations, according to the results and my current knowledge:

  1. LibSVM accuracy for Linear kernel is poor, it is because maybe I do not know that LibSVMRunner is doing something additional, or parameters are wrong. Two solutions:

    • I can take a closer look at LibSVMRunner code and correct SVMClient:predict() implementation,
    • or I suggest @sacherus taking a look at SVMClient::predict() and make a comment / change (remember to make no conflicts with SVMLight).

    (I'll try to investigate as fast as possible)

  2. Overall Poly and RBF accuracies are worse than e1071. Maybe someone can take look at the equations? (@igorsieradzki, @kudkudak, anyone)
  3. Sigmoid kernel results are better which is quite interesting.
ktalik commented 9 years ago

After adding klaR::svmlight here are current results of the benchmark:

[1] "0. linear kernel:"
[1] "e1071 acc: 0.972182"
[1] "gmum libsvm acc: 0.970717"
[1] "klaR svmlight acc: 0.972182"
[1] "gmum svmlight acc: 0.972182"
[1] "gmum libsvm 2e acc: 0.972182"
[1] "1. poly kernel:"
[1] "e1071 acc: 0.998536"
[1] "gmum libsvm acc: 0.970717"
[1] "klaR svmlight acc: 0.975110"
[1] "gmum svmlight acc: 0.972182"
[1] "gmum libsvm 2e acc: 0.975110"
[1] "2. rbf kernel:"
[1] "e1071 acc: 0.998536"
[1] "gmum libsvm acc: 0.985359"
[1] "klaR svmlight acc: 0.970717"
[1] "gmum svmlight acc: 0.970717"
[1] "gmum libsvm 2e acc: 0.961933"
[1] "3. sigmoid kernel:"
[1] "e1071 acc: 0.948755"
[1] "gmum libsvm acc: 0.967789"
[1] "klaR svmlight acc: 0.973646"
[1] "gmum svmlight acc: 0.973646"
[1] "gmum libsvm 2e acc: 0.973646"

(Many lines of klaR::svmlight's prediction output ommited. Couldn't get to mute this -- as commented in source file)

Questions: I think there might be something in Poly kernel calculations? And I still need to investigate LibSVM error.

kudkudak commented 9 years ago

:+1: for comparison with klaR

igorsieradzki commented 9 years ago

Btw. I don't know if this is common knowledge, but e1071 scales data, so that might be the reason for higher accuracy.

ktalik commented 9 years ago

Hey... Thanks for the info! In that case I think this task is done [1]. More elaborate comparisions / tests / accuracies should go to #312 (if it is not done yet).

[1] I've been studying LibSVMRunner code in comparision with LibSVM prediction calculations, and I can tell that in my opinion all parameters are correctly being stored within SVMConfiguration.

Here we go with pull request.