alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
785 stars 87 forks source link

Support for more generic feature selectors #321

Open angela97lin opened 4 years ago

angela97lin commented 4 years ago

Currently, we only have SelectFromModel. It would be nice to support some feature selectors (ex: SelectKBest, SelectPercentile) that don't rely on an estimator and instead simply select features using statistical tests.

As I was working through the catboost PR (#247), I was not able to use a feature selector in the catboost pipeline because catboost boasts being able to handle categorical data, but the random forest classifier in our RFClassifierSelectFromModel feature selector could not, making it difficult to use in the catboost pipeline.

dsherry commented 4 years ago

Good points. A couple thoughts:

Conceptually, I don't think there's a problem with using one model/estimator to do feature selection for another. I'm sure there's ways that could produce poor performance, but we can measure that in the performance tests if we set them up well.

RE the encoding, we could build a feature selection component which internally could do whatever sort of encoding is required to get the component working, right? I guess AFAIK there's no fundamental block against doing something like this.