EpistasisLab / tpot2

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
https://epistasislab.github.io/tpot2/
GNU Lesser General Public License v3.0
201 stars 28 forks source link

Dev #120

Closed perib closed 7 months ago

perib commented 8 months ago

[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]

What does this PR do?

Some bug fixes

edited ColumnOneHotEncoder to simulate the behavior of the OneHotEncoder. It will now automatically select columns with fewer than 10 unique values and one hot encode them (same behavior as TPOT1). The original OneHotEncoder is not compatible with pandas dataframes, but this one should be. Replaced the OneHotEncoder with ColumnOneHotEncoder in the tpot2 search space. We could also change this later to make the number of unique values a searchable parameter.

A bug in the initial pipeline generator caused the initial pipeline to all be of size 1 when leaf_config_dict was not set. Added an additional check to make sure that the initial population pipelines will include more nodes from the inner_config_dict when leaf_config_dict is None.

A typo prevented the complexity scorer from recursively searching sklearn Pipeline classes. Fixed the typo to correctly pass in the estimator to the recursive function. Previously it was passing in a tuple from the pipeline.steps, rather than the actual estimator found in the second index of that tuple.