Closed bchen1116 closed 3 years ago
The new default values for elastic net introduced in #2269 make AutoMLSearch throw an error on one of the training folds for the elastic net pipeline on the cjs dataset.
Repro
import pandas as pd
from evalml import AutoMLSearch
X = pd.read_csv("/Users/freddy.boulton/Downloads/cjs.csv")
y = X.pop("TR")
automl = AutoMLSearch(X, y, "multiclass", pipeline_parameters={"Elastic Net Classifier": dict(alpha=0.5, l1_ratio=0.5)})
automl.search()
Using the old values fixes the issue:
automl = AutoMLSearch(X, y, "multiclass", pipeline_parameters={"Elastic Net Classifier": dict(alpha=0.5, l1_ratio=0.5)})
automl.search()
Looks like it's tough to come up with reasonable defaults for this classifier. @bchen1116 I like the idea of switching the estimator class but I don't think the sklearn estimator you linked to has a predict_proba
method.
Can you we use LogisticRegressionClassifier
but hard-code the penalty
to be elasticnet? Do we even need ElasticNetClassifier
if LogisticRegressionClassifier
can do what it can do?
Good point @freddyaboulton! I'll look into it and see if the performance is similar!
PR 2269 put up a fix for the Elastic Net classifier that partially solves the issues we had for fraud/lead-scoring demos. However, we want to look further into this Elastic Net classifer. We have 2 main questions:
Discussion chain in the comments on this doc
TODO: