EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.73k stars 1.57k forks source link

ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. #830

Open CBrauer opened 5 years ago

CBrauer commented 5 years ago

When I run the following code:


from sklearn.ensemble import GradientBoostingRegressor
from sklearn.kernel_approximation import Nystroem
from sklearn.linear_model import ElasticNetCV, RidgeCV
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import make_pipeline, make_union
from sklearn.svm import LinearSVR
from tpot.builtins import StackingEstimator
from xgboost import XGBRegressor

# Average CV score on the training set was:-0.21141374399237495
exported_pipeline = make_pipeline(
    StackingEstimator(estimator=GradientBoostingRegressor(alpha=0.8, 
                                                          learning_rate=1.0,
                                                          loss="quantile",
                                                          max_depth=8,
                                                          max_features=0.7000000000000001,
                                                          min_samples_leaf=16,
                                                          min_samples_split=5,
                                                          n_estimators=100,
                                                          subsample=0.6500000000000001)),
    StackingEstimator(estimator=XGBRegressor(learning_rate=0.01,
                                             max_depth=5,
                                             min_child_weight=10,
                                             n_estimators=500,
                                             nthread=1, 
                                             subsample=0.55)),
    StackingEstimator(estimator=LinearSVR(C=1.0,
                                          dual=True,
                                          epsilon=0.1,
                                          loss="epsilon_insensitive",
                                          tol=0.01)),
    StackingEstimator(estimator=RidgeCV()),
    StackingEstimator(estimator=LinearSVR(C=20.0,
                                          dual=True,
                                          epsilon=0.0001,
                                          loss="epsilon_insensitive",
                                          tol=0.001)),
    Nystroem(gamma=0.15000000000000002,
             kernel="laplacian",
             n_components=6),
    StackingEstimator(estimator=ElasticNetCV(l1_ratio=0.05,
                                             tol=0.01)),
    KNeighborsRegressor(n_neighbors=91,
                        p=2,
                        weights="distance")
)
exported_pipeline.fit(X_train, y_train)
score = exported_pipeline.score(X_test, y_test)
print('\nScore: ', score)

I get:

ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.

Any help in making this warning message go away will be greatly appreciated.

Charles

weixuanfu commented 5 years ago
import warnings
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    exported_pipeline.fit(X_train, y_train)

Using warnings module may help.

CBrauer commented 5 years ago

I'm nervous about suppressing warnings. Should I trust the results anyway? One advice I got from the Web was to normalize the data. Can I simply put “RobustScaler()” as the first line to the pipe?

weixuanfu commented 5 years ago

I think StandardScaler() or RobustScaler() may avoid the warning but the pipeline may not work very well on the normalized data since the pipeline was evaluated on the raw data.

bluexie commented 4 years ago

标准化未实现,遇到相同的问题,已经解决,标准化一下训练集和测试集就好了。 Standardization is not implemented, encountered the same problem has been solved, standardized training and test sets look just fine.

aghaPathan commented 4 years ago

I encountered the same problem. I used GridSearchCV and gave a random array of possible increased number of iterations and according adjusted my array. Finally at some point it converged at a value.