dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

Create page with info about trainers #5896

Open justinormont opened 3 years ago

justinormont commented 3 years ago

Currently, there is not a unified page to go to to see a summary of what each trainer is good at, and its features/limitations.

Here's a start which I made with @glebuk a while back: image

The cells without information (empty cells) are simply not filled in. Specifically, it's a missing answer.

In Excel form: ML.NET Trainer Cheatsheet.xlsx

Table in Markdown form

Tree Ensemble Linear META Baseline Other
GAMs FastForest Fast Tree LightGBM SDCA Sym SGD LbFGS Averaged Perceptron OlsTrainer OGD OVA Pairwise Coupling Prior Trainer KMeansTrainer PCA Linear SVM Field Aware FM Naiive Bayes
Tasks Binary Classification ★★ ★★ ★★
Multi-class Classification ★ [3] ★ [3] ★ [3] ★★ ★★ ★ [3] ★ [3] ★ [3] ★ [3]
Regression
Clustering
Anomaly Detection
Ranking ★★
Good for Features Ngram Featurized Text ★★ ★★ ★★★ ★★
Word Embedding ★★ ★★ ★★★ ★★ ★★★
Categorical ★★ ★★ ★★ ★★ ★★ ★★ ★★★
Numeric ★★ ★★
Featurized Image ★★★ ★★ ★★
Accuracy Good first try ★★ ★★ ★★
Average Accuracy ★★ ★★★ ★★★½ ★★ ★★ ★★ ½
Is Tunable ★★★ ★★
Speed Speed ½ ★★ ★★½ ★★★ ★★★★ ★★ ★★★★ ★★★★ ★★ ½ ½ ★★★ ★★★★
Streaming (data > RAM) ★ [2] ★ [2] ★ [4]
RAM (bytes) per Row row row row row 8 row row row feat*k*8 row row ?
~Scalability for #features 10^5 10^5 10^5 10^5 10^9 10^9 10^9 10^9  ∞ 10^4 10^5 ?
Benefits from Caching ★?
Model Explainable Model ★★★ ★★ ★★ ★★ ★★ ★★ ★★ ?
Exportable to ONNX ?
Can be turned into Code ? ?
Requires Normalization ★ [1] ★? ? ★ [5]
Returns Probability
Online Updatable ?
Model Size (bytes/class) ~ ~ ~ ~ feat*8 feat*8 feat*8 feat*8 feat*8 feat*8 0 feat*8*k feat*8 feat*8

Notes: [1] k-means needs normalization, but you can also up weight certain features by giving them a larger scale [2] ova & pkpd are streamable if the underlying trainers are [3] most binary trainers handle multi-class classification tasks when used with OVA or PKPD [4] naive bayes stops reading at 2B rows (is fixed?) [5] naive bayes does better on numeric data with MeanVar w/ fixZero=false, or whitening (both move the mean to zero; though will densify ngrams/categorical which is very slow)

briacht commented 3 years ago

Thanks @justinormont !! I can work on adding a new file for this as part of the repo clean up

@luisquintanilla Is there anywhere in Docs we could add this to as well?

jwood803 commented 3 years ago

@briacht @luisquintanilla Hope y'all don't mind me chiming in but would this doc be what you need? https://docs.microsoft.com/en-us/dotnet/machine-learning/resources/tasks

justinormont commented 3 years ago

@jwood803: Looks like an appropriate place. This may also work: https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-choose-an-ml-net-algorithm