alteryx / evalml

EvalML is an AutoML library written in python.
https://evalml.alteryx.com
BSD 3-Clause "New" or "Revised" License
776 stars 86 forks source link

Add target name to pipeline's `input_feature_names` #1576

Open angela97lin opened 3 years ago

angela97lin commented 3 years ago

Note: migrated from #1493 which tracks simply adding target names to the output of predict; this issue tracks a suggestion also made in the thread.

Based on the discussion raised by @gsheni and @kmax12 in Slack we should update the .input_feature_names value held our pipelines to hold both the feature column names, as well as the target name.

Currently, .input_feature_names returns a dictionary, where the keys are the components and the corresponding values are the feature names that the component gets. We could update this to be a tuple, where the first element of the tuple is the current list of feature names, and the second element of the tuple is the name of the target.

That is, currently we have something like:

{"OHE": ["col1", "col2", "col3"], "Imputer: ["col1_1", "col1_2", "col2", "col3"]}

We could update this to:

{"OHE": (["col1", "col2", "col3"], "target"), "Imputer: (["col1_1", "col1_2", "col2", "col3"], "target")}

I don't see the target name changing, so maybe this is a bit silly, but it also keeps the nice structure we have now where we keep track of what every component sees.

chukarsten commented 3 years ago

@angela97lin @dsherry : Didn't you merge a PR that addressed this already? I swear I reviewed it.

angela97lin commented 3 years ago

@chukarsten Ah, similar! I merged in https://github.com/alteryx/evalml/pull/1578 which keeps track of the target name, but it doesn't update input_feature_names to do so. I guess this issue would track adding / consolidating this to the input_feature_names attribute, if we still think that's useful.

dsherry commented 3 years ago

Once #1757 is done, this issue tracks also adding target name to estimators. (Accessible from pandas Series as name attr)