alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
735 stars 83 forks source link

Add dimensionality reduction to AutoMLSearch #2747

Open eccabay opened 2 years ago

eccabay commented 2 years ago

The exploration of the performance of one of our perf test datasets in #2628 raised the notice that the dataset has too many dimensions when compared to the number of data points, and performance significantly suffers because of it. We have dimensionality reduction components (both PCA and LDA), but right now we have no convenient way to add these to AutoMLSearch. I see two different ways we could make this easier when given high-dimensional datasets:

chukarsten commented 2 years ago

@eccabay Thanks for submitting! We're going to ice this for now until product/customer demand catches up and we can prioritize this a little better. Great idea.

eccabay commented 2 years ago

The motivation for this issue stems from this comment. Some datasets, including restaurants.csv from issue #2628, are almost doomed to fail thanks to the curse of dimensionality. Adding dimensionality reduction should significantly improve performance on these sorts of datasets.