Ekeany / Boruta-Shap

A Tree based feature selection tool which combines both the Boruta feature selection algorithm with shapley values.
MIT License
581 stars 88 forks source link

Noob issue: Having trouble loading up my own dataset #16

Closed apavlo89 closed 4 years ago

apavlo89 commented 4 years ago

Sorry for noob question but I'm trying to load up my own database and its not working...

This is how I am trying to do it

from BorutaShap import BorutaShap, load_data

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('D:/test/AAT.csv')
dataset = pd.get_dummies(dataset) 

# Saving feature names for later use
dataset_list = list(dataset.columns)

X = dataset.iloc[:, 1:].values 
y = dataset.iloc[:, 0].values 

# no model selected default is Random Forest, if classification is True it is a Classification problem
Feature_Selector = BorutaShap(importance_measure='shap',
                              classification=False)

Feature_Selector.fit(X=X, y=y, n_trials=100, random_state=42)

# Returns Boxplot of features
Feature_Selector.plot(which_features='all')
Feature_Selector.results_to_csv(filename='feature_importance')

This gives me error AttributeError: 'numpy.ndarray' object has no attribute 'columns'. Any idea what I am doing wrong? Thank you for your help in the matter

apavlo89 commented 4 years ago

I fixed it by remove the .values from x and Y select function