dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.88k forks source link

Add missing trainers to AutoML BinaryClassificationExperiment #5609

Open jpepg4 opened 3 years ago

jpepg4 commented 3 years ago

System information

AutoML 0.17.4

Issue

AutoML BinaryClassificationExperiment is currently missing support for several binary classification trainers (LDSVM, GAM, etc). In particular adding LDSVM would be helpful as it has a large number of parameters and it isn't really overlapped with other existing trainers.

justinormont commented 3 years ago

The missing trainers should be available for a user to enable, but most of the missing should be disabled by default.

The default search space for AutoML is meant to be efficient. GAM for instance, is both very slow and will rarely win compared to FastTree or LightGBM. GAM is designed for explainability, so a user may want to enable it manually, perhaps along with linear models, for use cases where explainability is key.

LDSVM is an exception, it was added to ML․NET after AutoML's search space was created. LDSVM would be beneficial to enable by default, as it can do well on text/ngrams.

List of current binary trainers used in AutoML: https://github.com/dotnet/machinelearning/blob/5dbfd8acac0bf798957eea122f1413209cdf07dc/src/Microsoft.ML.AutoML/API/BinaryClassificationExperiment.cs#L88-L137

Notes: OVA-LDSVM should be added to multiclass AutoML also. When adding a new trainer, be sure it works in the CodeGen.

michaelgsharp commented 3 years ago

@justinormont so the action items from this is to add the OVA-LDSVM to auto ml and test it. Correct? Any others that should be added as well?

justinormont commented 3 years ago

@justinormont so the action items from this is to add the OVA-LDSVM to auto ml and test it. Correct? Any others that should be added as well?

Should add, and have on by default:

Should add, and have off by default:

Adding the rest, and having them default off, keeps our search space efficient, while allowing the user to manually enable those trainers. For instance, if they want to optimize GAM (which is not an available choice currently).