UBC-MDS / KmeansR

Documentation
https://ubc-mds.github.io/KmeansR/index.html
Other
0 stars 2 forks source link

Kmeans

R-CMD-check codecov

Package description

This package consists of R functions that implement k-means clustering from scratch. This will work on any dataset with valid numerical features, and includes fit, predict, and clustersummary functions, as well as elbow and silhouette methods for hyperparameter “k” optimization. A high level overview of each function is given below. See each function’s documentation for more details.

There is a built-it k-means function in R. This package is not meant to add to the existing ecosystem; but is rather intended to deepen fundamental understanding of these algorithms.

Installation

To use this pacakge, install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("UBC-MDS/Kmeans")

Dependencies

Usage

This is a basic example which shows you how to solve a common problem:

First, load the required pacakges and fit the data:

library(Kmeans)
library(tidyverse)
library(dplyr)

X = data.frame(x1 = c(1, 2, 3, 5, 53, 21, 43),
               x2 = c(1, 2, 3, 5, 53, 21, 43))
kmeans_results = fit(X, 2)

Use the fitted model to predict labels for new data:

X_new = data.frame(x1 = c(1, 4),
                   x2 = c(3, 2))
predict(X_new, kmeans_results$centers)

Use the clustersummary function to get information on the fitted model:

clustersummary(X, kmeans_results$centers, kmeans_results$labels)

If uncertain on the best value of k to choose, use the elbow and silhouette functions:

centers <- c(2, 3, 4, 5)
inertia <- elbow(X, centers)$inertia

k_vector <- c(2, 3, 4, 5)
scores <- silhouette(X, k_vector)$scores

Tests

To test that the functions work as intended, run devtools::test() in the root of the project repo in an Rconsole.