georgian-io-archive / foreshadow

An automatic machine learning system
https://foreshadow.readthedocs.io
Apache License 2.0
29 stars 2 forks source link

Cleaner save empty columns #198

Closed jzhang-gp closed 4 years ago

jzhang-gp commented 4 years ago

Description

Currently when cleaner mapper decides to drop some empty columns, it does not remember which columns are dropped. During prediction, it will find all the empty columns from the test set and drop them.

Normally that's OK if the prediction set has the same empty columns. However, when the test set have empty columns that are not identified during the training process, they will also be dropped and cause error downstream. This change guards against that and fail early to alert the user.