Open mglowacki100 opened 7 years ago
Thanks for filing the issue! I'll look into it in more depth later, but as a quick fix, try upgrading XGBoost. featureimportances is a relatively new attribute.
If that doesn't do it, it's likely that they changed their API without a deprecation notice. The build I have on continuous integration passes all the tests. But that relies on a Docker image where we built xgboost a few weeks back.
Let me know if updating to their latest (and auto_ml's latest) version works. If not, I'll look into this more.
Thanks for the detailed issue! Would love to hear any other feedback you have.
On Sat, Jun 3, 2017 at 3:15 AM mglowacki100 notifications@github.com wrote:
- In documentation http://auto-ml.readthedocs.io/en/latest/api_docs_for_geeks.html I'd listed separately models for regression and classification eg. Models for regression:
model_names=[ 'ARDRegression', #slow 'AdaBoostRegressor', 'BayesianRidge', 'ElasticNet', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'Lasso', 'LassoLars', 'LinearRegression', 'OrthogonalMatchingPursuit', 'PassiveAggressiveRegressor', 'RANSACRegressor', 'RandomForestRegressor', 'Ridge', 'SGDRegressor',
non-scikit models:
'DeepLearningRegressor', #gpu support #'LGBMRegressor', #!!!problem key 'XGBRegressor' #!!!problem ]
- I've noticed that 'XGBRegressor' requires (or I do something wrong): `optimize_final_model=False' or I got error:
About to run GridSearchCV on the pipeline for the model XGBRegressor to predict y Fitting 2 folds for each of 8 candidates, totalling 16 fits Traceback (most recent call last): File "
", line 1, in File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile execfile(filename, namespace) File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace) File "/home/mglowacki/Desktop/Mercedes/automl_merc.py", line 62, in 'XGBRegressor' #!!!problem File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 469, in train self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y) File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 704, in train_ml_estimator gscv_results = self.fit_grid_search(X_df, y, grid_search_params, feature_learning=feature_learning) File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 630, in fit_grid_search gs.fit(X_df, y) File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 945, in fit return self._fit(X, y, groups, ParameterGrid(self.param_grid)) File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 564, in _fit for parameters in parameter_iterable File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in call self.retrieve() File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve raise exception File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 682, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get raise self._value File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks put(task) File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/pool.py", line 371, in send CustomizablePickler(buffer, self._reducers).dump(obj) _pickle.PicklingError: Can't pickle <class 'xgboost.sklearn.XGBRegressor'>: it's not the same object as xgboost.sklearn.XGBRegressor 'ml_for_analytics=False' or I got error:
Here are the results from our XGBRegressor predicting y Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 944, in _print_ml_analytics_results_random_forest trained_feature_importances = final_model_obj.model.featureimportances AttributeError: 'XGBRegressor' object has no attribute 'featureimportances'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "
", line 1, in File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile execfile(filename, namespace) File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace) File "/home/mglowacki/Desktop/Mercedes/automl_merc.py", line 62, in 'XGBRegressor' #!!!problem File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 469, in train self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y) File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 671, in train_ml_estimator trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning) File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 553, in fit_single_pipeline self.print_results(model_name) File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 577, in print_results self._print_ml_analytics_results_random_forest() File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 947, in _print_ml_analytics_results_random_forest trained_feature_importances = final_model_obj.model.featureimportance AttributeError: 'XGBRegressor' object has no attribute 'featureimportance' — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ClimbsRocks/auto_ml/issues/232, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsSVUUnc4ylQJy8vwUQ7XwfL2qbktcQks5sATJFgaJpZM4NvC12 .
Thanks for help!
It was necesarry to build xgboost from scratch (pip version is not fresh enough).
I have auto_ml 2.1.9.
Xgboost update solves problem with feature_importances_
for ml_for_analytics=True
but for 'optimize_final_model=True'
I've got different error than before:
********************************************************************************************
About to run GridSearchCV on the pipeline for the model XGBRegressor to predict y
Fitting 2 folds for each of 8 candidates, totalling 16 fits
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "/home/mglowacki/Desktop/Mercedes/automl_merc.py", line 63, in <module>
'XGBRegressor' #!!!problem
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 469, in train
self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 704, in train_ml_estimator
gscv_results = self.fit_grid_search(X_df, y, grid_search_params, feature_learning=feature_learning)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 630, in fit_grid_search
gs.fit(X_df, y)
File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 945, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 550, in _fit
base_estimator = clone(self.estimator)
File "/usr/local/lib/python3.5/dist-packages/sklearn/base.py", line 69, in clone
new_object_params[name] = clone(param, safe=False)
File "/usr/local/lib/python3.5/dist-packages/sklearn/base.py", line 126, in clone
(estimator, name))
RuntimeError: Cannot clone object XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=200,
n_jobs=1, nthread=-1, objective='reg:linear', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
subsample=1), as the constructor does not seem to set parameter n_jobs
There is a similar issue, but without solution: https://stackoverflow.com/questions/37646034/scikit-learn-cannot-clone-object-as-the-constructor-does-not-seem-to-set-par
Looks like XGBoost made a change that sklearn's GSCV isn't liking. I'm hoping @gaw89 might have an easy fix in XGBoost for this. In the meantime, you can try checking out the code before this commit and using that.
Thanks for letting me know about this @mglowacki100 !
I've attached my code and dataset: my_dataset.tar.gz
import numpy as np
import pandas as pd
from auto_ml import Predictor
df_train = pd.read_csv('train_A.csv')
df_test = pd.read_csv('test_A.csv')
#y_train = train['y']1
#X_train = train.drop('y', axis=1)
#X_train = X_train
#X_test = test
column_descriptions = {
'y': 'output',
'ID': 'ignore',
'X0': 'categorical',
'X1': 'categorical',
'X2': 'categorical',
'X3': 'categorical',
'X4': 'categorical',
'X5': 'categorical',
'X6': 'categorical',
'X7': 'categorical',
'X8': 'categorical'
}
ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
#ml_predictor.train(df_train)
ml_predictor.train(df_train, optimize_final_model=True, perform_feature_selection=True,
take_log_of_y=False,
cv=2,
ml_for_analytics=False,
# #regressors
model_names=[
# 'ARDRegression', #- !!!slow,
# 'AdaBoostRegressor',
# 'BayesianRidge',
# 'ElasticNet',
# 'ExtraTreesRegressor',
# 'GradientBoostingRegressor',
# 'Lasso',
# 'LassoLars',
# 'LinearRegression',
# 'OrthogonalMatchingPursuit',
# 'PassiveAggressiveRegressor',
# 'RANSACRegressor',
# 'RandomForestRegressor',
# 'Ridge',
# 'SGDRegressor',
# #nons-cikit models:
#'DeepLearningRegressor', #gpu support
#'LGBMRegressor',
'XGBRegressor' #!!!problem
]
# #classifiers
# models=[ 'AdaBoostClassifier',
# 'ExtraTreesClassifier',
# 'GradientBoostingClassifier',
# 'LogisticRegression', 'MiniBatchKMeans',
# 'OrthogonalMatchingPursuit', 'PassiveAggressiveClassifier',
# 'Perceptron',
# 'RandomForestClassifier',
# 'RidgeClassifier', 'SGDClassifier', #more_models
# 'DeepLearningClassifier',
# 'LGBMClassifier','XGBClassifier']
)
predictions = ml_predictor.predict(df_test)
np.savetxt("file_name.csv", predictions, delimiter=",", fmt='%s', header='')
About to run GridSearchCV on the pipeline for the model XGBRegressor to predict y Fitting 2 folds for each of 8 candidates, totalling 16 fits Traceback (most recent call last): File "", line 1, in
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "/home/mglowacki/Desktop/Mercedes/automl_merc.py", line 62, in
'XGBRegressor' #!!!problem
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 469, in train
self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 704, in train_ml_estimator
gscv_results = self.fit_grid_search(X_df, y, grid_search_params, feature_learning=feature_learning)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 630, in fit_grid_search
gs.fit(X_df, y)
File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 945, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_search.py", line 564, in _fit
for parameters in parameter_iterable
File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in call
self.retrieve()
File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
raise exception
File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/parallel.py", line 682, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/local/lib/python3.5/dist-packages/sklearn/externals/joblib/pool.py", line 371, in send
CustomizablePickler(buffer, self._reducers).dump(obj)
_pickle.PicklingError: Can't pickle <class 'xgboost.sklearn.XGBRegressor'>: it's not the same object as xgboost.sklearn.XGBRegressor
Here are the results from our XGBRegressor predicting y Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 944, in _print_ml_analytics_results_random_forest trained_feature_importances = final_model_obj.model.featureimportances AttributeError: 'XGBRegressor' object has no attribute 'featureimportances'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "/home/mglowacki/Desktop/Mercedes/automl_merc.py", line 62, in
'XGBRegressor' #!!!problem
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 469, in train
self.trained_final_model = self.train_ml_estimator(estimator_names, self._scorer, X_df, y)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 671, in train_ml_estimator
trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 553, in fit_single_pipeline
self.print_results(model_name)
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 577, in print_results
self._print_ml_analytics_results_random_forest()
File "/usr/local/lib/python3.5/dist-packages/auto_ml/predictor.py", line 947, in _print_ml_analytics_results_random_forest
trained_feature_importances = final_model_obj.model.featureimportance
AttributeError: 'XGBRegressor' object has no attribute 'featureimportance'