EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.57k stars 1.55k forks source link

estimator type #1319

Closed perib closed 9 months ago

perib commented 9 months ago

[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]

What does this PR do?

added an estimator type property to the tpot estimator.

Any background context you want to provide?

A change with the latest version of sklearn would lead to errors when using sklearn scorers. This is resolved with the property so that sklearn knows whether the estimator is a classifier or regressor.

ValueError: TPOTClassifier should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.

Here is the code to reproduce the error. Error is gone with the fix

from tpot import TPOTRegressor, TPOTClassifier
from sklearn.model_selection import train_test_split
import sklearn
import sklearn.datasets
import sklearn.metrics
import tpot
import dill as pickle

X, y = sklearn.datasets.make_classification(n_samples=1000, n_features=45, n_informative=12, n_redundant=7, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80, test_size=0.20, random_state=42)

est = TPOTClassifier(generations=2, population_size=2, verbosity=2, random_state=42, n_jobs=-2 ,cv=10, scoring='roc_auc',)

est.fit(X_train, y_train)

scorer = sklearn.metrics.get_scorer("roc_auc_ovo")

s = scorer(est,X,y)

Questions: