cosmicBboy / ml-research

Research projects in Machine Learning
MIT License
6 stars 2 forks source link

[metalearn] columntransformer re-arranges the input matrix based on column indices specified in transformers arg #12

Closed cosmicBboy closed 5 years ago

cosmicBboy commented 5 years ago

This causes a bug in pipelines with multiple column transformers, e.g. the SimpleImputer will create a new array where the categorical features are stacked to the left of the array, followed by continuous features. This causes an issue with the OneHotEncoder that now depends on the categorical features being in their original positions.

Potential Solutions

  1. pre-stack the arrays to the following order: continuous, datetime, categorical
  2. use dataframes instead and specify list of strings in the transformer instead of int indices (need to test)

Solution

do (1).

(2) will only work if the task environment programmatically groups data-type-compatible data and feature preprocessors together. This may be the direction to go eventually for a more robust system, but for now (1) will solve the problem.

cosmicBboy commented 5 years ago

fixed by 4ae48b985a2ee5353f25d4efec52663699a03e98