Implement more classifier pipeline operators

rhiever commented 8 years ago

Similar to the Decision Tree and Random Forest classifier pipeline operators, also implement:

[x] Logistic regression: http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
[x] SVM: http://scikit-learn.org/stable/modules/svm.html#classification
[x] kNN: http://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification
[x] Gradient boosting: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

@rasbt, do you think we should add any more than this? I'd like to add ANNs eventually, but since they're not directly supported in sklearn, that will wait for a later time.

rhiever commented 8 years ago

Just to bug you even more, @rasbt, check out https://github.com/rhiever/tpot/commit/96c69a4fd260e1dfeeed1e360943a86803204cb7 for the implementation of these new classifiers.

Any other parameters that you would include?

Would you implement any parameters differently?

rhiever commented 8 years ago

@amueller, do you have any advice on what additional model parameters to open up to evolution?

Chris7 commented 8 years ago

Nice to see classifiers are coming along!

One thing I've noticed is you have a ton of repetitive code and it seems like you could abstract the implementation of all these methods to just a generic_regressor/generic_classifier function and a common structure for storing the regressors/parameters. Have you tried any generic approach yet?

amueller commented 8 years ago

ANNs are in dev. They don't have dropout yet, but will soon.

How about gradient boosting?

rhiever commented 8 years ago

@Chris7: I noticed that today too when I was implementing the new classifiers. I'll look into abstracting the common bits next week, as that would indeed save quite a lot of repeated code.

@amueller: :+1: Looking forward to ANNs in sklearn. I'll add gradient boosting as well. Do you have a sense for what are the 2-3 most important parameters (if there are that many) for each model? I've tinkered around with various parameters for various models but don't have a comprehensive view of them like you might.

amueller commented 8 years ago

We want to add a "Default grid" https://github.com/scikit-learn/scikit-learn/pull/5564 but it is somewhat stalled. I'm crazy busy at the moment but I hope to work on that soonish.

rhiever commented 8 years ago

Nice. Looks like this will be a good start. Thank you @amueller!

_DEFAULT_PARAM_GRIDS = {'AdaBoostClassifier':
                        [{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'AdaBoostRegressor':
                        [{'learning_rate': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'DecisionTreeClassifier':
                        [{'max_features': ["auto", None]}],
                        'DecisionTreeRegressor':
                        [{'max_features': ["auto", None]}],
                        'ElasticNet':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'GradientBoostingClassifier':
                        [{'max_depth': [1, 3, 5]}],
                        'GradientBoostingRegressor':
                        [{'max_depth': [1, 3, 5]}],
                        'KNeighborsClassifier':
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        'KNeighborsRegressor':
                        [{'n_neighbors': [1, 5, 10, 100],
                          'weights': ['uniform', 'distance']}],
                        'Lasso':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'LinearRegression':
                        [{}],
                        'LinearSVC':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'LogisticRegression':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SVC': [{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
                                 'gamma': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'MultinomialNB':
                        [{'alpha': [0.1, 0.25, 0.5, 0.75, 1.0]}],
                        'RandomForestClassifier':
                        [{'max_depth': [1, 5, 10, None]}],
                        'RandomForestRegressor':
                        [{'max_depth': [1, 5, 10, None]}],
                        'Ridge':
                        [{'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SGDClassifier':
                        [{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
                          'penalty': ['l1', 'l2', 'elasticnet']}],
                        'SGDRegressor':
                        [{'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01],
                          'penalty': ['l1', 'l2', 'elasticnet']}],
                        'LinearSVR':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0]}],
                        'SVR':
                        [{'C': [0.01, 0.1, 1.0, 10.0, 100.0],
                          'gamma': [0.01, 0.1, 1.0, 10.0, 100.0]}]}

amueller commented 8 years ago

feedback welcome. I haven't actually reviewed this, not sure if someone else has ;)

rhiever commented 8 years ago

I'll drop some comments in there.

rhiever commented 8 years ago

Going to close this issue and open a new one for expanding classifier parameter search.

EpistasisLab / tpot

Implement more classifier pipeline operators #39