Deepthi10 / Intrusion-Detection-using-Machine-Learning-on-NSL--KDD-dataset

Pre-processing NSL-KDD dataset using Data mining techniques. Algorithm written in python to detect the attacks in NSL KDD dataset.
22 stars 11 forks source link

One Hot encoding #1

Open rcon313 opened 3 years ago

rcon313 commented 3 years ago

Hey I am using some of this code and using it for a project I am carrying out. when I get to the code below I keep getting the error ValueError: Shape of passed values is (11850, 76), indices imply (11850, 84) Any idea why I would be getting this error ? I don't if its the version of python I am running, I looked online and said to you use columntransformer, I am still a newbie to python so any help would be great.

One-Hot-Encoding¶

enc = OneHotEncoder() data_categorical_values_encenc = enc.fit_transform(data_categorical_values_enc) data_cat_data = pd.DataFrame(data_categorical_values_encenc.toarray(),columns=dumcols)

test set

data_test_categorical_values_encenc = enc.fit_transform(data_categorical_values_enc) data_test_cat_data = pd.DataFrame(data_test_categorical_values_encenc.toarray(),columns=testdumcols)

esra9910 commented 1 year ago

hey @rcon313 , did you solve your Problem? I have the same issue.can you help me? Thank you very much!

Rahulrajesh15 commented 1 year ago

It's because the feature 'service' have different number of categories in the training and testing data. So after one hot encoding the training data you need to find the category which are present in train but not in test data. Then add that col in test data fill it with 0. This will solve the issue