alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
772 stars 86 forks source link

Intelligent Feature Engineering based on column name #2010

Open ParthivNaresh opened 3 years ago

ParthivNaresh commented 3 years ago

Considering that many use cases exist which leverage data that is commonly integrated throughout their respective domains, such as debt and income levels for financial services, or phase trials and patients recruited for biotech, it would help to have the ability to automatically identify and engineer features based on their data type and/or name.

For example:

Additionally, aggregating alternative names for common features could help with this, like taking "Line of Credit", "LOC", "Credit Line" and changing them to a singular default name that can then be put through the above process.

This would need a design document to determine the scope and default use cases to initially build for. A follow up issue could be filed for integrating it into AutoML to provide automated feature engineering for datasets that have matching column names to the ones we're looking for.

dsherry commented 3 years ago

Yep! In icebox for now. This feels like it should be a woodwork feature (@gsheni )