choshin84 / learning_memo

personal learning memo
0 stars 0 forks source link

OneHotEncoding: Train/Test data set #25

Open choshin84 opened 4 years ago

choshin84 commented 4 years ago

Tweet summary

Better use sklearn OneHotEncoder fit/transform to ensure test data set will be OnHotEncoded by training data set.

when using "drop first" option to eliminate co-linearity, ignore new label option cannot be used # Useful link https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
choshin84 commented 4 years ago

More handy with category name in OneHotEncoding output dataframe http://contrib.scikit-learn.org/categorical-encoding/onehot.html