Ekeany / Boruta-Shap

A Tree based feature selection tool which combines both the Boruta feature selection algorithm with shapley values.
MIT License
559 stars 86 forks source link

Hi, I am having trouble with a task in BorutaSHAP which is stuck at 0% progression apparently.[BUG] #112

Open federiconuta opened 1 year ago

federiconuta commented 1 year ago

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

federiconuta commented 1 year ago

So basically I have a dataset with 444 features and 120975 rows. After having split the data into train and test as follows:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split

Y=df_75_85.iloc[:,0]
X=df_75_85.iloc[:, 1:443]
##TRANSFORMING Y:
Y[Y >=0.5] = 1
Y[Y <0.5] = 0
x_train, x_test, y_train, y_test = train_test_split(df_75_85.iloc[:, 1:443], df_75_85.iloc[:,0],test_size = 0.25)
x_train=x_train.replace([np.inf, -np.inf], 9999).dropna(axis=0)
x_test=x_test.replace([np.inf, -np.inf], 9999).dropna(axis=0)

I am trying to reproduce your class-imbalance notebook like this:

model = RandomForestClassifier(class_weight = 'balanced')

# no model selected default is Random Forest, if classification is False it is a Regression problem
Feature_Selector = BorutaShap(model=model,
                              importance_measure='shap',
                              classification=True)

Feature_Selector.fit(X=x_train, y=y_train, n_trials=100, random_state=0)

but the progression bar remains stuck at 0% for a lot of time. Is there a reason why? If this does depend on my data, is there a way to share them with you as it would be interesting to understand the instances for which BorutaSHAP fatigues.

Thank you