ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

Error running tutorial example #369

Closed vabatista closed 6 years ago

vabatista commented 6 years ago

I'm trying to run the sample code provided here:

And I'm getting this error:

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at http://auto.ml
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
Running basic data cleaning
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}

********************************************************************************************
About to fit the pipeline for the model GradientBoostingRegressor to predict MEDV
Started at:
2018-01-04 14:22:46
[1] random_holdout_set_from_training_data's score is: -8.721
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-45-2b110959d739> in <module>()
     11 ml_predictor = Predictor(type_of_estimator='regressor', column_descriptions=column_descriptions)
     12 
---> 13 ml_predictor.train(df_train)
     14 
     15 ml_predictor.score(df_test, df_test.MEDV)

C:\ProgramData\Anaconda3\lib\site-packages\auto_ml\predictor.py in train(***failed resolving arguments***)
    632 
    633         # This is our main logic for how we train the final model
--> 634         self.trained_final_model = self.train_ml_estimator(self.model_names, self._scorer, X_df, y)
    635 
    636         if self.ensemble_config is not None and len(self.ensemble_config) > 0:

C:\ProgramData\Anaconda3\lib\site-packages\auto_ml\predictor.py in train_ml_estimator(self, estimator_names, scoring, X_df, y, feature_learning, prediction_interval)
   1212         # Use Case 1: Super straightforward: just train a single, non-optimized model
   1213         elif (feature_learning == True and self.optimize_feature_learning != True) or (len(estimator_names) == 1 and self.optimize_final_model != True):
-> 1214             trained_final_model = self.fit_single_pipeline(X_df, y, estimator_names[0], feature_learning=feature_learning, prediction_interval=False)
   1215 
   1216         # Use Case 2: Compare a bunch of models, but don't optimize any of them

C:\ProgramData\Anaconda3\lib\site-packages\auto_ml\predictor.py in fit_single_pipeline(self, X_df, y, model_name, feature_learning, prediction_interval)
    837             print(start_time)
    838 
--> 839         ppl.fit(X_df, y)
    840 
    841         if self.verbose:

C:\ProgramData\Anaconda3\lib\site-packages\auto_ml\utils_model_training.py in fit(self, X, y)
    266 
    267                     self.model.set_params(n_estimators=num_iter, warm_start=warm_start)
--> 268                     self.model.fit(X_fit, y)
    269 
    270                     if self.training_prediction_intervals == True:

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\gradient_boosting.py in fit(self, X, y, sample_weight, monitor)
   1005                                     self.estimators_.shape[0]))
   1006             begin_at_stage = self.estimators_.shape[0]
-> 1007             y_pred = self._decision_function(X)
   1008             self._resize_state()
   1009 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\gradient_boosting.py in _decision_function(self, X)
   1123         # not doing input validation.
   1124         score = self._init_decision_function(X)
-> 1125         predict_stages(self.estimators_, X, self.learning_rate, score)
   1126         return score
   1127 

TypeError: Argument 'X' has incorrect type (expected numpy.ndarray, got csr_matrix)

sklearn version: 0.18

After upgrade sklearn to 0.19 it worked, but this should be a requirement for pip install. Right?

ClimbsRocks commented 6 years ago

thanks for the bug report! sorry for the delayed response.

i think this is probably a versioning issue, as i can't reproduce it on the latest version of auto_ml.

sklearn, while being a fantastic project in many ways that generally tries to be really good about deprecation warnings, still sometimes releases breaking changes. This, unfortunately, was one of them. all new versions of auto_ml take this into account, and work on both sklearn versions. but older versions of auto_ml didn't know this was going to be an issue (i haven't yet trained a ml predictor to anticipate bugs, though that'd certainly be a fascinating project), so they didn't have that workaround built in yet.

i'm gonna close this, because i can't repro it on recent versions of auto_ml. so i think it's been fixed already.

thanks for filing the bug! would love to hear any other thoughts you have.