This causes a bug in pipelines with multiple column transformers, e.g. the SimpleImputer will create a new array where the categorical features are stacked to the left of the array, followed by continuous features. This causes an issue with the OneHotEncoder that now depends on the categorical features being in their original positions.
Potential Solutions
pre-stack the arrays to the following order: continuous, datetime, categorical
use dataframes instead and specify list of strings in the transformer instead of int indices (need to test)
Solution
do (1).
(2) will only work if the task environment programmatically groups data-type-compatible data and feature preprocessors together. This may be the direction to go eventually for a more robust system, but for now (1) will solve the problem.
This causes a bug in pipelines with multiple column transformers, e.g. the SimpleImputer will create a new array where the categorical features are stacked to the left of the array, followed by continuous features. This causes an issue with the OneHotEncoder that now depends on the categorical features being in their original positions.
Potential Solutions
Solution
do (1).
(2) will only work if the task environment programmatically groups data-type-compatible data and feature preprocessors together. This may be the direction to go eventually for a more robust system, but for now (1) will solve the problem.