georgian-io-archive / foreshadow

An automatic machine learning system
https://foreshadow.readthedocs.io
Apache License 2.0
29 stars 2 forks source link

Fix exporter issue when providing override from Categorical to Numerical type #200

Closed jzhang-gp closed 4 years ago

jzhang-gp commented 4 years ago

Description

DataExporter failed during an intent override test (from Categorical to Numerical) if we train the model, provide the override and retrain the model. This was not captured before because we only did test on Numerical to Categorical.

The root cause is that when we change an intent from Categorical to Numerical, the preprocessor (one step ahead of the exporter) generates fewer columns. Imagining that column A with unique values (1,2,3,4) are one hot encoded as A_1, A_2, A_3, A_4 in the first training process. In the process after the override, these one hot encoded columns are gone.

Due to this reason, the exporter is unable to handle the change right now because the column mapping in the exporter's parallel process is still expecting A_1, A_2, A_3, and A_4.

The fix is to reset the parallel process when there is intent override. Since there is no computation in the exporter at all, it is safe to reset and start from scratch without any performance drag.

Some extra things: