RoboticsClubIITJ / ML-DL-implementation

An implementation of ML and DL algorithms from scratch in python using nothing but NumPy and Matplotlib.
BSD 3-Clause "New" or "Revised" License
48 stars 69 forks source link

Implement One hot encoding #97

Closed Player0109 closed 3 years ago

Player0109 commented 3 years ago

Implemented the One hot encoding class.

Fixes #41

FUNCTIONS in OneHotEncoder class are: (1) FIT(INPUT_X, THRESHOLD) --- It is used to calculate the number of unique values in each column and tell whether a particular column should be encoded or not. (2) CHECK_TRANSFORM(INPUT_X) --- It is used to check whether the data which is being transformed has the same values as the data which was used to fit it. (3) TRANSFORM(INPUT_X) --- It is used to OneHotEncode the data based on the data which was used to fit (4) FIT_TRANSFORM(INPUT_X, THRESHOLD) --- This function is just a combination of the fit and the transform function.

INPUTS in OneHotEncoder are: X - It is a NumPy array of size n x m. thresh - It is a threshold value which is calculated as THRESH = (NUMBER OF UNIQUE VALUES IN A COLUMN)/(LENGTH OF COLUMN). A column whose threshold value is below the input threshold value which encodes otherwise not.

VARIABLES in OneHotEncoder class are: ncols - It is used to store the number of columns in the fit data. arr_dic - Is is an array of dictionary, where each dictionary is the LabelEncoded value of a particular column. arr_nunique - It is an array which is used to store the number of unique values in a particular column. encode - It is an array of the size of the number of columns in the fit data. It has a value of 1 if the columns is to be encoded otherwise 0.

Player0109 commented 3 years ago

@rohansingh9001, I have made changes as you told. Please review it one more time.