Open trhallam opened 5 years ago
TPOT enforces imputation on dataset with NaN because most operators in TPOT configuration do not support NaN. We may need another configuration if this no_impuation
option is added.
I understand, it is a very specific case that I'm working on. Just currently there is no way to escape imputation with TPOT unless you modify the source. It is not a necessity perhaps more a nice to have.
As a general rule tpot enforces imputation to match sklearn requirements for all real values in the input and output data. XGboost as a special case allows for the input of NaN values.
Context of the issue
I am trying to optimise XGboost specifically using a data set with quite a lot of holes in it. I do not want to perform imputation as it affects the results. I looked in base.py and quickly modified the
_check_data
function to ignore NaN values and to not perform imputation but was wondering if tpot can be modified to accommodate this scenario with XGboost?A 'no_imputation' keyword might be added to
TPOTBase .__init__
for example to prevent imputation.Example Edits: