Serialziation and deserialization of TPOT

Description

Given the fact that TPOT cannot pickle itself, we have to adopt the suggested alternative of pickling the fitted_pipeline_ attribute as mentioned https://github.com/EpistasisLab/tpot/issues/520. In another post, the author shared the same problem and had to use the suggested solution:

The workaround is to fit TPOTClassifier in standalone mode, and create a PMMLPipeline object off the TPOTClassifier.fittedpipeline attribute using the sklearn2pmml.make_pmml_pipeline(Pipeline: pipeline) utility function...The lesson is that the PMML representation is only concerned with the final state – the deployable model. The PMML representation is not concerned with the specifics of the AutoML tool/algorithm, the initial state, or any of the intermediate states of the search process.

Note that if our processed data frame contains NaN, TPOT will auto-impute the missing values. However, the imputation is not part of the fittedpipeline and not something we can extract from TPOT's instance attribute. Therefore, the unpickled instance may fail if the test dataset contains missing value. It relies on Foreshadow to produce a dataframe without missing values, which is addressed by PR https://github.com/georgianpartners/foreshadow/pull/183.

georgian-io-archive / foreshadow

Serialziation and deserialization of TPOT #188

Description