Perf test and deep dive into Elastic Net

bchen1116 commented 3 years ago

PR 2269 put up a fix for the Elastic Net classifier that partially solves the issues we had for fraud/lead-scoring demos. However, we want to look further into this Elastic Net classifer. We have 2 main questions:

Are we incurring a double shrinkage using this classifier, as specified here? This could explain the poor performance we have been seeing.
Can we use SKLearn's impl of Elastic Net rather than our current use of SGDClassifier? How does this change the performance of our classifier (run perf tests here)

Discussion chain in the comments on this doc

TODO:

Perf test new Elastic Net
Figure out which to use moving forward, and look more into the performance to ensure we aren't subjecting our estimator to double shrinkage

freddyaboulton commented 3 years ago

The new default values for elastic net introduced in #2269 make AutoMLSearch throw an error on one of the training folds for the elastic net pipeline on the cjs dataset.

Repro

import pandas as pd
from evalml import AutoMLSearch

X = pd.read_csv("/Users/freddy.boulton/Downloads/cjs.csv")
y = X.pop("TR")
automl = AutoMLSearch(X, y, "multiclass", pipeline_parameters={"Elastic Net Classifier": dict(alpha=0.5, l1_ratio=0.5)})
automl.search()

Using the old values fixes the issue:

automl = AutoMLSearch(X, y, "multiclass", pipeline_parameters={"Elastic Net Classifier": dict(alpha=0.5, l1_ratio=0.5)})
automl.search()

Looks like it's tough to come up with reasonable defaults for this classifier. @bchen1116 I like the idea of switching the estimator class but I don't think the sklearn estimator you linked to has a predict_proba method.

Can you we use LogisticRegressionClassifier but hard-code the penalty to be elasticnet? Do we even need ElasticNetClassifier if LogisticRegressionClassifier can do what it can do?

cjs.csv

bchen1116 commented 3 years ago

Good point @freddyaboulton! I'll look into it and see if the performance is similar!

alteryx / evalml

Perf test and deep dive into Elastic Net #2289