Closed RosalieSherry closed 3 years ago
Hi Rosalie,
This looks cool, but I think you meant to submit it as a New Page rather than as an Issue. See Nick's guide here: https://lost-stats.github.io/Contributing/Contributing.html
I'll close in the meantime.
I think this is an existing page, but doesn't have R. Editing it into the existing page rather than submitting as an Issue would be a good idea. Can you do that Rosalie?
R
The simplest way to perform KNN in R is with the package class. It has a KNN function that is rather user friendly and does not require you to do distance computing as it runs everything with euclidean distance. For more advanced types nearest neighbors testing it would be best to use the matchit function from the matchit package. To verify results this example also used the confusionMatrix function from the package caret.
Due to how this package is designed the easiest room for error would be during normalization by normalizing variables such as character or other ones that do not require normalization. Another good source of error is not including drop = TRUE for your target, or y, vector which will prevent the model from running. Finally, the way this example verifies results it is vital to convert the target into a factor as the data has to be in similar kind in order for R to give you an output.
References for R walkthrough
The dataset used is from the UCI Machine Learning Repository under Breast Cancer Wisconsin (Diagnostic) Data Set. Rdocumentation for KNN was used in order to work on this example. Also, statology's "how to create a confusion matrix" wdbc.csv