Open kransom14 opened 2 years ago
It seems obvious that the importance is calculated on the features before preprocessing. Especially with permutation importance, permuting dummy columns is non-sensical as suddenly two dummies would be 1 within observation.
I am using
explain_tidymodels()
to compute variable importance. I have a workflow which includes a recipe with astep_dummy()
step. I'm trying to understand why the associated variable importance calculated withmodel_parts()
is given for the original variables rather than the one-hot-encoded variables when this step is included. Is the permutation importance aggregated at some point for the group of one-hot-encoded variables that go together? I didn't see this explained in the documentation. Reprex below. Please advise, Thank you