EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.76k stars 1.57k forks source link

Mutation operator is not mutating all pipeline steps #1119

Closed hanshupe closed 4 years ago

hanshupe commented 4 years ago

I call tpot with the template template='Selector-Transformer-Regressor' and observed a poor optimization behavior so I started to print the pipelines of each generation. I saw that after a randomly initialized population the mutation operator is exchanging only the Selector step but is never exchanging the Transformer or the Regressor step by another operator. It's only mutating the parameters inside those steps.

Due to this behavior I end up very soon with a population like below, where only the Selector step is mutating but the population is full with the same regression model and transformer and is never exchanged anymore. Therefore the optimization gets stuck soon in some local maximum which depends very much on the initial population.

SGDRegressor(StandardScaler(SelectKBest(input_matrix, SelectKBest__k=5)), SGDRegressor__alpha=0.001, SGDRegressor__eta0=0.01, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.9, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.1, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(SelectFromModel(input_matrix, SelectFromModel__ElasticNetCV__copy_X=True, SelectFromModel__ElasticNetCV__cv=6, SelectFromModel__ElasticNetCV__eps=0.001, SelectFromModel__ElasticNetCV__l1_ratio=0.28, SelectFromModel__ElasticNetCV__max_iter=1855, SelectFromModel__ElasticNetCV__n_alphas=100, SelectFromModel__ElasticNetCV__normalize=True, SelectFromModel__ElasticNetCV__selection=cyclic, SelectFromModel__ElasticNetCV__tol=0.001, SelectFromModel__max_features=44)), SGDRegressor__alpha=0.025, SGDRegressor__eta0=0.01, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=1.0, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.1, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(RFE(input_matrix, RFE__LassoCV__copy_X=True, RFE__LassoCV__cv=6, RFE__LassoCV__eps=0.05, RFE__LassoCV__max_iter=1195, RFE__LassoCV__n_alphas=100, RFE__LassoCV__normalize=True, RFE__LassoCV__tol=0.01, RFE__n_features_to_select=17, RFE__step=0.060000000000000005)), SGDRegressor__alpha=0.001, SGDRegressor__eta0=0.01, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.2, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.25, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=22)), SGDRegressor__alpha=0.05, SGDRegressor__eta0=1.0, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.0, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.25, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(RFE(input_matrix, RFE__LassoCV__copy_X=True, RFE__LassoCV__cv=6, RFE__LassoCV__eps=0.05, RFE__LassoCV__max_iter=1195, RFE__LassoCV__n_alphas=100, RFE__LassoCV__normalize=True, RFE__LassoCV__tol=0.01, RFE__n_features_to_select=17, RFE__step=0.11000000000000001)), SGDRegressor__alpha=0.001, SGDRegressor__eta0=0.01, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.2, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.1, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(RFE(input_matrix, RFE__LassoCV__copy_X=True, RFE__LassoCV__cv=6, RFE__LassoCV__eps=0.1, RFE__LassoCV__max_iter=1195, RFE__LassoCV__n_alphas=100, RFE__LassoCV__normalize=True, RFE__LassoCV__tol=0.01, RFE__n_features_to_select=17, RFE__step=0.11000000000000001)), SGDRegressor__alpha=0.001, SGDRegressor__eta0=0.01, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.2, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.1, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(SelectKBest(input_matrix, SelectKBest__k=12)), SGDRegressor__alpha=0.01, SGDRegressor__eta0=1.0, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.2, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.25, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(RFE(input_matrix, RFE__LassoCV__copy_X=True, RFE__LassoCV__cv=3, RFE__LassoCV__eps=0.01, RFE__LassoCV__max_iter=1286, RFE__LassoCV__n_alphas=100, RFE__LassoCV__normalize=True, RFE__LassoCV__tol=0.001, RFE__n_features_to_select=21, RFE__step=0.19)), SGDRegressor__alpha=0.01, SGDRegressor__eta0=0.01, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.3, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=huber, SGDRegressor__penalty=elasticnet, SGDRegressor__power_t=0.1, SGDRegressor__shuffle=False)
SGDRegressor(StandardScaler(SelectPercentile(input_matrix, SelectPercentile__percentile=78)), SGDRegressor__alpha=1e-05, SGDRegressor__eta0=1.0, SGDRegressor__fit_intercept=True, SGDRegressor__l1_ratio=0.7, SGDRegressor__learning_rate=invscaling, SGDRegressor__loss=squared_loss, SGDRegressor__penalty=l1, SGDRegressor__power_t=0.1, SGDRegressor__shuffle=False)
weixuanfu commented 4 years ago

Could you please provide a demo for reproducing this issue?

FYI, TPOT only uses point mutation function when using template to randomly generate fixed length pipeline and the point mutation should randomly mutate a Primitive (a step, like mutating SGDRegressor to RandomForestRegressor) or a Terminal (a hyperparameter of a step).

Based on the limited population strings in this issue, I found that hyperparamters of SGDRegressor are different among some individuals, which may means that the point mutation happened on the regressor step's hyperparameter.

Also, increasing population_size may avoid the local optimization issue.

hanshupe commented 4 years ago

Below is a simple example to reproduce it. Additionally in base.py as last line in the "_random_mutation_operator" function I added:

print("\nBef. mutation", str(individual), "\nAft. mutation", str(offspring))

While it does mutate the Selector Primitive it never mutates the Transformer or the Regressor step. It does mutate the hyperparameters / Terminals of those two but not the Primitives:

import pandas as pd
import numpy as np
import tpot
import random
seed = 42

random.seed(a=seed)
input = pd.DataFrame(np.random.uniform(0,100,size=(25, 3)), columns=list('ABC'))
target = np.random.uniform(0,100, size = 25)

tpot = tpot.TPOTRegressor(template='Selector-Transformer-Regressor',
                                             max_time_mins=60,
                                             cv=5,
                                             n_jobs=1, generations=10000,
                                             population_size=100, mutation_rate=0.9,
                                             crossover_rate=0.1,
                                             verbosity=3, max_eval_time_mins=5,
                                             random_state=seed,
                                             scoring="r2", subsample=1,
                                             early_stop=None,
                                             config_dict='TPOT light')

tpot.fit(input, target)
weixuanfu commented 4 years ago

Thank you for reporting this bug. I fixed it via PR #1122 and it will be merged to development branch soon. And It will be included in next release of TPOT in middle of Oct.

For testing the development branch, you may install TPOT with patch into your environment via:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/EpistasisLab/tpot.git@development
hanshupe commented 4 years ago

Thx for the quick fix.

weixuanfu commented 4 years ago

The issue is fixed in v0.11.6.