dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

How to use LinearSvm? #1673

Closed maxt3r closed 6 years ago

maxt3r commented 6 years ago

Sorry if this is not the place for questions.

Currently I'm using FastTree for binary classification, but I would like to give SVM a try and compare metrics.

All the docs mention LinearSvm, but I can't find code example anywhere.

mlContext.BinaryClassification.Trainers does not have public SVM trainers. There is LinearSvm class and LinearSvm.TrainLinearSvm static method, but they seem to be intended for different things.

What am I missing?

Version: 0.7

rogancarr commented 6 years ago

Hi @maxt3r,

This is a great place for a question -- thanks for reaching out!

I have two answers for you: What the status of the API is, and how to use the LinearSVM in the meantime.

First, we have LinearSVM in the ML.NET codebase, but we do not yet have samples or the API extensions to place it in mlContext.BinaryClassification.Trainers. This is being worked through in issue #1318. I'll link this to that issue, and mark it as a bug.

In the meantime, you can use direct instantiation to get access to LinearSVM:

var arguments = new LinearSvm.Arguments()
{
    NumIterations = 20
};
var linearSvm = new LinearSvm(mlContext, arguments);
var svmTransformer = linearSvm.Fit(trainSet);
var scoredTest = svmTransformer.Transform(testSet);

This will give you an ITransformer, here called svmTransformer that you can use to operate on IDataView objects.

One note about our LinearSVM implementation: This is an implementation PEGASOS, which is an online implementation of SVM. See this paper for more information.


Instance of Issue #1318

maxt3r commented 6 years ago

Thanks a lot for quick response. Feel free to close the issue.

rogancarr commented 6 years ago

Hi @maxt3r,

I'll close the issue, but one more comment: If you want a good linear baseline to judge other solutions from, I'd suggest using SDCA rather than LinearSVM. You can reference it like mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent. In general**, it's the fastest and best "out-of-the-box" linear solver we have.

** Of course, the specific performances will depend on the dataset, but this is the "best bet" solution.