This change is to fix a problem that when there are missing values for categorical columns, estimators like Logistic Regression will fail while TPOT will not because TPOT imputes missing values on its own.
Right now, only numerical features have an imputation step. For categorical features, after discussing with our data scientists, we choose to treat missing value as a separate category.
For Neither type, we have to temporarily use the categorical encoder. The reason behind that is explained in the code comment.
Description
This change is to fix a problem that when there are missing values for categorical columns, estimators like Logistic Regression will fail while TPOT will not because TPOT imputes missing values on its own.
Right now, only numerical features have an imputation step. For categorical features, after discussing with our data scientists, we choose to treat missing value as a separate category.
For Neither type, we have to temporarily use the categorical encoder. The reason behind that is explained in the code comment.