Pandas update - Githubissues

EpistasisLab / tpot2

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

GNU Lesser General Public License v3.0

201 stars 28 forks source link

What does this PR do?

Creates a new column in evaluated individuals in "Eval Error". This is to keep the columns for scores as floats without strings. All evaluation errors would be in the "Eval Error" column which is an object dtype. This would resolve any incompatibilities with pandas 2.0+. Updated for both steady state and base estimator version of tpot2

While trying to print out some of the evaluated individual columns, I noticed a simple bug in how str was defined from the graph individuals. it normally works by exporting a pipeline then printing the string for that, but that can fail if the hyperparameters are invalid. Just added a try-except block to catch those cases. (This will be changed again in the next update, so I didn't want to make a whole new PR for that.)

There is an edge case where if an individual is created, but its evaluation is incomplete, the value in Eval Errors column in np.nan instead of None. This can happen if the global timeout is triggered (max_time_seconds). We don't want to label those as "timeout" since that should be reserved for going over max_eval_time_seconds.

But I'm not sure if we can change the default missing value in pandas to None, and it doesn't allow us to add nans at the same time we add the strings for the error. I think we just leave it as is for now?

EpistasisLab / tpot2

Pandas update #121

What does this PR do?

Where should the reviewer start?

How should this PR be tested?