Open hanshupe opened 3 years ago
I think current TPOT may not support this kind of application. But I think it need adding the ColumnTransformer from scikit-learn into TPOT.
Interested in working on this. ColumnTransformer
takes a list of columns. How to make this list available to the genetic algorithm?
OK I have an idea. In TPOTBase.fit()
, add a hook called _dynamically_modify_config_dict()
, which will modify config_dict
with a dict like this:
'tpot.builtins.ColumnTransformer': {
'transformers': [''sklearn.preprocessing.StandardScaler', 'sklearn.preprocessing.RobustScaler', ...],
'include_col_1': [True, False],
'include_col_2': [True, False],
...
'include_col_n': [True, False],
},
What do you think @weixuanfu ?
I see that after the TPOT optimization a preprocessor like robustScaler was selected. I wonder if it's possible that robustScaler is applied not on the entire set of features but only on a few were it makes sense. One feature may have outliers and require robustScaler(0.1, 0.9), but the other features not. Can this be considered with TPOT?