Ekeany / Boruta-Shap

A Tree based feature selection tool which combines both the Boruta feature selection algorithm with shapley values.
MIT License
559 stars 86 forks source link

[BUG] #107

Open nickmeroi opened 1 year ago

nickmeroi commented 1 year ago

Hi, I'm trying to understand how the train_or_test parameter is used. I can see from the code that when it is set to 'test' the data (X_boruta) is split into train/test sets and the model fit on the training set (x_boruta_train). However, I don't understand how the importance is being calculated just on the test set. The explain() function is calculating the shap values using the entire data (X_boruta), and not the test set (X_boruta_test): self.shap_values = np.array(explainer.shap_values(self.X_boruta)) Am I missing something?