Open federiconuta opened 1 year ago
So basically I have a dataset with 444 features and 120975 rows. After having split the data into train and test as follows:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
Y=df_75_85.iloc[:,0]
X=df_75_85.iloc[:, 1:443]
##TRANSFORMING Y:
Y[Y >=0.5] = 1
Y[Y <0.5] = 0
x_train, x_test, y_train, y_test = train_test_split(df_75_85.iloc[:, 1:443], df_75_85.iloc[:,0],test_size = 0.25)
x_train=x_train.replace([np.inf, -np.inf], 9999).dropna(axis=0)
x_test=x_test.replace([np.inf, -np.inf], 9999).dropna(axis=0)
I am trying to reproduce your class-imbalance notebook like this:
model = RandomForestClassifier(class_weight = 'balanced')
# no model selected default is Random Forest, if classification is False it is a Regression problem
Feature_Selector = BorutaShap(model=model,
importance_measure='shap',
classification=True)
Feature_Selector.fit(X=x_train, y=y_train, n_trials=100, random_state=0)
but the progression bar remains stuck at 0% for a lot of time. Is there a reason why? If this does depend on my data, is there a way to share them with you as it would be interesting to understand the instances for which BorutaSHAP fatigues.
Thank you
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.