Describe the bug
cannot correctly clone CascadeForestClassifier/CascadeForestRegressor object with sklearn.base.clone when using customized stimators
To Reproduce
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.base import clone
from deepforest import CascadeForestRegressor
import xgboost as xgb
import lightgbm as lgb
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
model = CascadeForestRegressor(random_state=1)
# set estimator
n_estimators = 4 # the number of base estimators per cascade layer
estimators = [lgb.LGBMRegressor(random_state=i) for i in range(n_estimators)]
model.set_estimator(estimators)
# set predictor
predictor = xgb.XGBRegressor()
model.set_predictor(predictor)
# clone model
model_new = clone(model)
# try to fit
model.fit(X_train, y_train)
Expected behavior
No error
Additional context
~/miniconda3/envs/pycaret/lib/python3.8/site-packages/deep_forest-0.1.5-py3.8-linux-x86_64.egg/deepforest/cascade.py in fit(self, X, y, sample_weight)
1004 if not hasattr(self, "predictor_"):
1005 msg = "Missing predictor after calling `set_predictor`"
-> 1006 raise RuntimeError(msg)
1007
1008 binner_ = Binner(
RuntimeError: Missing predictor after calling `set_predictor`
This bug occours because when the model is cloned, if the model has customized predictor or estimators, predictor='custom' will be cloned, while self.predictor_ / self.dummy_estimators will not be correctly cloned, which introduced the bug described above.
I think this bug can be easily fixed by putting the predictor and the list of estimators into the parameter of CascadeForestClassifier/CascadeForestRegressor, just like the way of those meta estimators (e.g. ngboost), but maybe the corresponding APIs will have to be changed.
For example, the API parameters could be:
model = CascadeForestRegressor(
estimators=[lgb.LGBMRegressor(random_state=i) for i in range(n_estimators)],
predictor=xgb.XGBRegressor(),
)
Describe the bug cannot correctly clone
CascadeForestClassifier
/CascadeForestRegressor
object withsklearn.base.clone
when using customized stimatorsTo Reproduce
Expected behavior No error
Additional context
This bug occours because when the model is cloned, if the model has customized predictor or estimators,
predictor='custom'
will be cloned, whileself.predictor_
/self.dummy_estimators
will not be correctly cloned, which introduced the bug described above.I think this bug can be easily fixed by putting the predictor and the list of estimators into the parameter of
CascadeForestClassifier
/CascadeForestRegressor
, just like the way of those meta estimators (e.g.ngboost
), but maybe the corresponding APIs will have to be changed.For example, the API parameters could be: