Content Enhancement : KNN

AM1CODES commented 3 years ago

The below mentioned additions are required -

Math - We need to add a solved example to show the working of the algorithm. To be precise, a pen and paper example of the math behind the algorithm with help of images like we have used for other algorithms.

sudharsansrini commented 3 years ago

KNN - K Nearest Neighbor

KNN algorithm is mainly used to classify and predicting the values in Machine Learning. We can use it for Supervised Machine Learning problems(Labeled) such that Classification and Regression.

KNN
K - constant value NN - Nearest Neighbor

KNN is simply , Finding the K Nearest Neighbor of the data point and classifying the data points with maximum number of classes of the neighbors

20210831_155414_1

We can find the value of K by finding the mean of the error in each records and plotting as graph K is the value where the mean error rate is lesser

error =[]
for i in range(1, 40):
    model = KNeighborsClassifier(n_neighbors=i)
    model.fit(x_train, y_train)
    pred_i = model.predict(x_test)
    error.append(np.mean(pred_i != y_test))

plt.figure(figsize=(15, 15))
plt.plot(range(1, 40), error, color = "red", linestyle = "dashed", marker="o", markerfacecolor="blue", markersize=10)
plt.title('Error Rate K value')
plt.xlabel('K value')
plt.ylabel('Mean Error')
plt.show()

Finding K -> Finding how many neighbors that we have to choose around the data point

Figure_1

Value of k = less error rate After choosing K value, You may have a question like, How will it find the k nearest points ?

Distance of the data points will be calculated in KNN using Minkowski distance.

Based on Minkowski Distance metric, We gonna classify the data points. p=1, Manhattan Distance p=2, Euclidean Distance

Manhattan Distance

1_KDgfdK6SooXtaUvlnXdpaA

This is the distance between real vectors using sum of their absolute difference.

Euclidean Distance

1_9LeaMTcOXxeTPN-VCbKloQ

Euclidean distance is calculated as the square root of sum of the squared differences between a new point (x) & existing point y

Hamming Distance

1_1lXpSrDyY37V4Ke--GkRQQ

It is used for categorical variables. If the value x and the value y are the same, the distance d will be equal to 0. Otherwise distance equal to 1

The above methods are used to find the distance and to finding the k closest neighbor of that particular data point. With the classes of neighbors, We can classify the data point with maximum neighbor's class

Thank you!

AM1CODES commented 3 years ago

Hey @sudharsansrini the content that you mentioned above looks really amazing and we could incorporate it in the main platform. I am pretty sure that writing this would have taken a good amount of time and i request you to make a PR for the same because otherwise it won't get counted as your contribution. Feel free to join our discord if you need any help with how to contribute and stuff like that.

OneStep-elecTRON / onestep-electron-content