Aritro-94 commented 4 years ago

xverse_titanic

I am trying to perform feature selection using xverse's VotingSelector on Titanic data set, which is a binary classification problem, and the dataset also contains categorical variables whcih i have one-hot-encoded. I am repeatedly facing this error. I am using xverse version 1.0.5 and the Python version is 3.6. Kindly help.

Sundar0989 commented 4 years ago

Could you please provide the code and also a sample data?

Aritro-94 commented 4 years ago

This is my code and the data set is attached

dataset: titanic.zip `import pandas as pd import numpy as np

Loading the Dataset

df_orig=pd.read_csv("titanic.csv",sep=",") df_orig.head()

Data preparation

df=df_orig.drop(["Name"],axis=1) df1=pd.get_dummies(df) df1.drop(["Sex_female"],axis=1,inplace=True) ohe=pd.get_dummies(df["Pclass"],prefix="Pclass") ohe.drop(["Pclass_3"],axis=1,inplace=True) df1=df1.join(ohe) df1.drop(["Pclass"],axis=1,inplace=True)

Splting the dataset into Trainig and Testing set

from sklearn.model_selection import train_test_split X=df1.iloc[:,1:] y=df1.loc[:,["Survived"]] x_tr,x_te,y_tr,y_te=train_test_split(X,y,test_size=0.25,random_state=11)

Using Voting Selector

from xverse.ensemble import VotingSelector clf=VotingSelector(selection_techniques=['RF', 'RFE', 'ETC', 'CS', 'L_ONE']) clf.fit(x_tr,y_tr) # At this step the error occurs`

Sundar0989 commented 4 years ago

Finally I figured it out. It happens because of the shape of your target variable needs to be changed.

The program is expecting a shape like this (len(data),) whereas the target shape using the code above generated a shape (len(data,1)). Because of that, the binning function did not work. Please add these two lines of code after you perform the train test split. Then it will work as intended. I will fix this in the future release, so that you dont have to add it anymore. Thanks.

y_tr = y_tr.T.squeeze()
y_te = y_te.T.squeeze()

Sundar0989 / XuniVerse

local variable 'bins_X_grouped' referenced before assignment #3

This is my code and the data set is attached

Loading the Dataset

Data preparation

Splting the dataset into Trainig and Testing set

Using Voting Selector