Closed hhummel closed 8 years ago
@dhimmel this sounds like a task to me :)
Type | Algorithm | Implementation | Quality |
---|---|---|---|
Discriminant Analysis | LDA/QDA | sklearn.discriminant_analysis | |
Generalized Linear Models | Logistic regression with L1/L2 regularization | sklearn.linear_model.LogisticRegressionCV | |
Generalized Linear Models | OLS | sklearn.linear_model.LinearRegression | |
Generalized Linear Models | Rridge | sklearn.linear_model.RridgeCV | |
Generalized Linear Models | LASSO | sklearn.linear_model.LassoCV | |
Generalized Linear Models | Elastic Net | sklearn.linear_model.ElasticNetCV | |
Robust | RANSAC | sklearn.linear_model.RANSACRegressor | |
Support Vector Machine | SVM | sklearn.svm.SVC (libsvm) | |
Support Vector Machine | Linear SVM with regularization | sklearn.svm.LinearSVC (liblinear) | |
Decision Trees | CART | sklearn.tree.DecisionTreeClassifier | |
Ensemble | Bagging/Random Subspace | sklearn.ensemble.BaggingClassifier | |
Ensemble | RandomForest | sklearn.ensemble.RandomForestClassifier | |
Ensemble | Adaboost | sklearn.ensemble.AdaBoostClassifier | |
Ensemble | Voting | sklearn.ensemble.VotingClassifier |
Type | Algorithm | Implementation | Quality |
---|---|---|---|
Ensemble | Stacking | Easy |
I went through the supervised learning classifiers in scikit-learn in the User's Guide, capturing the highlights of the blurb and url of the documentation.
I don't have a good enough feel for the problem we are solving or the characteristics of the data, but I like what they say about linear and quadratic discriminant analysis: "These classifiers are attractive because they have closed-form solutions that can be easily computed, are inherently multiclass, have proven to work well in practice and have no hyperparameters to tune."
I think we now have a good handle on what's available. Closing this issue, but feel free to continue discussion.
Columns including binary classification, how hard to implement and estimate of quality