alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
772 stars 86 forks source link

Perf test and deep dive into Elastic Net #2289

Closed bchen1116 closed 3 years ago

bchen1116 commented 3 years ago

PR 2269 put up a fix for the Elastic Net classifier that partially solves the issues we had for fraud/lead-scoring demos. However, we want to look further into this Elastic Net classifer. We have 2 main questions:

Discussion chain in the comments on this doc

TODO:

freddyaboulton commented 3 years ago

The new default values for elastic net introduced in #2269 make AutoMLSearch throw an error on one of the training folds for the elastic net pipeline on the cjs dataset.

Repro

import pandas as pd
from evalml import AutoMLSearch

X = pd.read_csv("/Users/freddy.boulton/Downloads/cjs.csv")
y = X.pop("TR")
automl = AutoMLSearch(X, y, "multiclass", pipeline_parameters={"Elastic Net Classifier": dict(alpha=0.5, l1_ratio=0.5)})
automl.search()

image

Using the old values fixes the issue:

automl = AutoMLSearch(X, y, "multiclass", pipeline_parameters={"Elastic Net Classifier": dict(alpha=0.5, l1_ratio=0.5)})
automl.search()

Looks like it's tough to come up with reasonable defaults for this classifier. @bchen1116 I like the idea of switching the estimator class but I don't think the sklearn estimator you linked to has a predict_proba method.

Can you we use LogisticRegressionClassifier but hard-code the penalty to be elasticnet? Do we even need ElasticNetClassifier if LogisticRegressionClassifier can do what it can do?

cjs.csv

bchen1116 commented 3 years ago

Good point @freddyaboulton! I'll look into it and see if the performance is similar!