[metalearn] columntransformer re-arranges the input matrix based on column indices specified in transformers arg

This causes a bug in pipelines with multiple column transformers, e.g. the SimpleImputer will create a new array where the categorical features are stacked to the left of the array, followed by continuous features. This causes an issue with the OneHotEncoder that now depends on the categorical features being in their original positions.

Potential Solutions

pre-stack the arrays to the following order: continuous, datetime, categorical
use dataframes instead and specify list of strings in the transformer instead of int indices (need to test)

Solution

do (1).

(2) will only work if the task environment programmatically groups data-type-compatible data and feature preprocessors together. This may be the direction to go eventually for a more robust system, but for now (1) will solve the problem.

cosmicBboy / ml-research

[metalearn] columntransformer re-arranges the input matrix based on column indices specified in transformers arg #12

Potential Solutions

Solution