However, fitting with gcv's bestestimator gives an error.
gcv.best_estimator_.fit(X,y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9504/1281572257.py in <module>
----> 1 gcv.best_estimator_.fit(X,y)
~\Anaconda3\envs\\lib\site-packages\rulefit\rulefit.py in fit(self, X, y, feature_names)
416 self.tree_generator.set_params(random_state=i_size+random_state_add) # warm_state=True seems to reset random_state, such that the trees are highly correlated, unless we manually change the random_sate here.
417 self.tree_generator.get_params()['n_estimators']
--> 418 self.tree_generator.fit(np.copy(X, order='C'), np.copy(y, order='C'))
419 curr_est_=curr_est_+1
420 self.tree_generator.set_params(warm_start=False)
~\Anaconda3\envs\env\lib\site-packages\sklearn\ensemble\_gb.py in fit(self, X, y, sample_weight, monitor)
492 'warm_start==True'
493 % (self.n_estimators,
--> 494 self.estimators_.shape[0]))
495 begin_at_stage = self.estimators_.shape[0]
496 # The requirements of _decision_function (called in two lines
ValueError: n_estimators=1 must be larger or equal to estimators_.shape[0]=552 when warm_start==True
-- version
scikit-learn 0.24 and 1.0
python 3.7
RuleFit 0.3
In order to optimize hyperparameters using sklearn's GridSearchCV, I think it's preferable to define a score function in the estimator
As shown below, we can avoid errors by applying scoring to GridSearchCV, so it is possible to use GridSearchCV even now.
However, fitting with gcv's bestestimator gives an error.
-- version scikit-learn 0.24 and 1.0 python 3.7 RuleFit 0.3