Neuraxio / Neuraxle

The world's cleanest AutoML library ✨ - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.
https://www.neuraxle.org/
Apache License 2.0
608 stars 62 forks source link

Bug: Using SKlearn's BaggingRegressor in combination with AutoML not working #454

Closed adalli13 closed 3 years ago

adalli13 commented 3 years ago

Describe the bug I tried to use a BaggingRegressor in an AutoML pipeline. Though, this leads to an error, as the base estimator of the BaggingRegressor is not JSON serializable.

The code excerpts are the following:

model_pipeline =  Pipeline([SKLearnWrapper(BaggingRegressor(
        GradientBoostingRegressor(), random_state=5, n_jobs=-1), HyperparameterSpace({
            "n_estimators": RandInt(10, 100),
            "max_features": Uniform(0.6, 1.0)})
        )])

validation_splitter = KFoldCrossValidationSplitter(3)
scoring_callback = ScoringCallback(
        median_absolute_error, higher_score_is_better=False)

auto_ml = AutoML(
            pipeline=model_pipeline,
            hyperparams_optimizer=RandomSearchHyperparameterSelectionStrategy(),
            validation_splitter=validation_splitter,
            scoring_callback=scoring_callback,
            n_trials=10,
            epochs=1,
            hyperparams_repository=HyperparamsJSONRepository(
                cache_folder="cache"),
            refit_trial=True,
            continue_loop_on_error=False)

auto_ml = auto_ml.fit(X_train, y_train)

Executing this code with data leads to the following error:
Traceback (most recent call last):
  File "...", line 271, in <module>
    auto_ml = auto_ml.fit(X_train, y_train)
  File "...\lib\site-packages\neuraxle\base.py", line 3505, in fit
    new_self = self.handle_fit(data_container, context)
  File "...\lib\site-packages\neuraxle\base.py", line 981, in handle_fit
    new_self = self._fit_data_container(data_container, context)
  File "...\lib\site-packages\neuraxle\metaopt\auto_ml.py", line 815, in _fit_data_container
    repo_trial.update_final_trial_status()
  File ...\lib\site-packages\neuraxle\metaopt\trial.py", line 194, in update_final_trial_status
    self.save_trial()
  File "...\lib\site-packages\neuraxle\metaopt\trial.py", line 92, in save_trial
    self.save_trial_function(self)
  File "...\lib\site-packages\neuraxle\metaopt\auto_ml.py", line 101, in save_trial
    self._save_trial(trial)
    yield from _iterencode_dict(o, _current_indent_level)
  File "...\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "...\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "...\lib\json\encoder.py", line 438, in _iterencode
    o = _default(o)
  File "...\lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type GradientBoostingRegressor is not JSON serializable

The same happens also if I try to use the SKLearnWrapper for the GradientBoostingRegressor class.

To Reproduce neuraxle 0.5.7 scikit-learn 0.24.1

Expected behavior Please tell me if I need to use an estimator like BaggingRegressor in a different way :)

vincent-antaki commented 3 years ago

hey @adalli13. This looks indeed like a problem with the SKLearnWrapper. I think it may not have been designed to handle sklearn BaseEnsemble instances because only hyperparameters - not complete models - are supposed to be encoded by the JSONHyperparametersRepo. In your case, the sklearn model that is the base_estimator attribute is being treated as if it was an hyperparameter sample.

I'll be looking more deeply into it in a couple of hours and will provide you with a workaround (or will fix it directly in the framework) today or tomorrow. Cheers.

vincent-antaki commented 3 years ago

Hey @adalli13, the fix for this problem has been merged into the master branch of Neuraxle and will be part of the 0.5.8 release. Your example should work now if you install neuraxle from the master branch. Please let me know if everything is good on your side and I'll close this issue. Thank you for signalling us this problem.

adalli13 commented 3 years ago

Hi @vincent-antaki, thank you for solving the issue. From my point of view, it is resolved.