cl.estimate_cate_by_2_models() does not work with XGBoost version 1.0.2

kennethverstraete commented 4 years ago

I installed the newest release of causallift and xgboost (version 1.0.2) and the function estimate_cate_by_2_models() gives a RunTimeError now:

RuntimeError: Cannot clone object XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=None, colsample_bytree=1, gamma=0, gpu_id=None, importance_type='gain', interaction_constraints=None, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, monotone_constraints=None, n_estimators=100, n_jobs=-1, nthread=None, num_parallel_tree=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method=None, validate_parameters=False, verbose=0, verbosity=None), as the constructor either does not set or modifies parameter missing

When I use an older version of xgboost (version 0.90 like in the example notebook), it works again. An API change in version 1.0.0 of xgboost maybe?

Minyus commented 4 years ago

The error seems to reproduce using scikit-learn 0.22 or later. Could you try XGBoost 1.x with scikit-learn 0.21.3?

xgboost 1.0.1 did not reproduce the error with scikit-learn 0.21.3 in my environment.

kennethverstraete commented 4 years ago

Indeed, no error with sklearn 0.21.3 and an error with 0.22 or later!

kennethverstraete commented 4 years ago

The entire error: `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last)

in ----> 1 train_df, test_df = cl.estimate_cate_by_2_models() ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/causal_lift.py in estimate_cate_by_2_models(self) 639 640 if self.runner: --> 641 self.kedro_context.run(tags=["311_fit", "312_bundle_2_models"]) 642 self.uplift_models_dict = self.kedro_context.catalog.load( 643 "uplift_models_dict" ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/context/flexible_context.py in run(self, tags, runner, node_names, only_missing) 177 ) 178 return super().run( --> 179 tags=tags, runner=runner, node_names=node_names, only_missing=only_missing 180 ) ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/context/flexible_context.py in run(self, **kwargs) 139 ): 140 # type: (...) -> Dict[str, Any] --> 141 d = super().run(**kwargs) 142 self.catalog.add_feed_dict(d, replace=True) 143 return d ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/context/flexible_context.py in run(self, runner, **kwargs) 129 ParallelRunner() if runner == "ParallelRunner" else SequentialRunner() 130 ) --> 131 return super().run(runner=runner, **kwargs) 132 133 ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/context/flexible_context.py in run(self, tags, runner, node_names, only_missing) 104 if only_missing and _skippable(self.catalog): 105 return runner.run_only_missing(pipeline, self.catalog) --> 106 return runner.run(pipeline, self.catalog) 107 108 ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/kedro/runner/runner.py in run(self, pipeline, catalog) 80 catalog.add(ds_name, self.create_default_data_set(ds_name)) 81 ---> 82 self._run(pipeline, catalog) 83 84 self._logger.info("Pipeline execution completed successfully.") ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/kedro/runner/sequential_runner.py in _run(self, pipeline, catalog) 75 for exec_index, node in enumerate(nodes): 76 try: ---> 77 run_node(node, catalog) 78 done_nodes.add(node) 79 except Exception: ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/kedro/runner/runner.py in run_node(node, catalog) 182 """ 183 inputs = {name: catalog.load(name) for name in node.inputs} --> 184 outputs = node.run(inputs) 185 for name, data in outputs.items(): 186 catalog.save(name, data) ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/kedro/pipeline/node.py in run(self, inputs) 420 except Exception as exc: 421 self._logger.error("Node `%s` failed with error: \n%s", str(self), str(exc)) --> 422 raise exc 423 424 def _run_with_no_inputs(self, inputs: Dict[str, Any]): ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/kedro/pipeline/node.py in run(self, inputs) 411 outputs = self._run_with_one_input(inputs, self._inputs) 412 elif isinstance(self._inputs, list): --> 413 outputs = self._run_with_list(inputs, self._inputs) 414 elif isinstance(self._inputs, dict): 415 outputs = self._run_with_dict(inputs, self._inputs) ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/kedro/pipeline/node.py in _run_with_list(self, inputs, node_inputs) 458 ) 459 # Ensure the function gets the inputs in the correct order --> 460 return self._decorated_func(*[inputs[item] for item in node_inputs]) 461 462 def _run_with_dict(self, inputs: Dict[str, Any], node_inputs: Dict[str, str]): ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/nodes/model_for_each.py in model_for_untreated_fit(*posargs, **kwargs) 250 251 def model_for_untreated_fit(*posargs, **kwargs): --> 252 return ModelForUntreated().fit(*posargs, **kwargs) 253 254 ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/causallift/nodes/model_for_each.py in fit(self, args, df_) 62 63 else: ---> 64 model.fit(X_train, y_train) 65 66 best_estimator = ( ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params) 734 # of the params are estimators as well. 735 self.best_estimator_ = clone(clone(base_estimator).set_params( --> 736 **self.best_params_)) 737 refit_start_time = time.time() 738 if y is not None: ~/PycharmProjects/bace/venv/lib/python3.6/site-packages/sklearn/base.py in clone(estimator, safe) 80 raise RuntimeError('Cannot clone object %s, as the constructor ' 81 'either does not set or modifies parameter %s' % ---> 82 (estimator, name)) 83 return new_object 84 RuntimeError: Cannot clone object XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=None, colsample_bytree=1, gamma=0, gpu_id=None, importance_type='gain', interaction_constraints=None, learning_rate=0.1, max_delta_step=0, max_depth=3, min_child_weight=1, missing=None, monotone_constraints=None, n_estimators=100, n_jobs=-1, nthread=None, num_parallel_tree=None, objective='binary:logistic', random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method=None, validate_parameters=False, verbose=0, verbosity=None), as the constructor either does not set or modifies parameter missing`

Minyus commented 4 years ago

Indeed, no error with sklearn 0.21.3 and an error with 0.22 or later!

Great!

I'm afraid that, at this moment, I do not plan to fix the issue with scikit-learn 0.22 because I'm not sure how to fix it and I haven't found any new features of scikit-learn 0.22 useful for CausalLift. Pull requests are welcome though.

Minyus / causallift

cl.estimate_cate_by_2_models() does not work with XGBoost version 1.0.2 #15