fani-lab / Osprey

Online Predatory Conversation Detection
2 stars 1 forks source link

Learning Vector Representation #11

Open SarahSalamati opened 2 years ago

SarahSalamati commented 2 years ago

What is one hot encoding? One hot encoding is one way to prepare data for an algorithm and improve prediction for categorical data, which are variables made up of label values. With one-hot, we create a new category column for each categorical value and give it a binary value of 1 or 0. A binary vector is used to represent each integer value. The index is denoted by a 1 and all values are zero.

why do we use it? It is useful for data that has no relationship to each other.

One hot encoding with Pandas In Pandas library, there is a function called get dummies to provide one-hot encoding.

Sample of coding One hot encoding import pandas as name name2 = name.DataFrame({"col":["sara","ehsan","hossein", "negar","hana"]}) print("The original data") print(name2) print(''50) name2_new = name.get_dummies(name2, columns=["col"], prefix="student") print("The transform data using get_dummies") print(name2_new)

One hot encoding with Sckit-learn In Sckit- learn library, there is a function called preprocessing module for One hot encoding.

Coding

hosseinfani commented 2 years ago

@SarahSalamati good job. If you put some examples for more clarification, would be awesome.

SarahSalamati commented 2 years ago

Thanks, Hossein. I have written example code in the panda and skit-learn library, so I will add them. Does it work?

hosseinfani commented 2 years ago

Thanks, Hossein. I have written example code in the panda and skit-learn library, so I will add them. Does it work?

sure! put them here also