Mu-Sigma / muHVT

Constructing hierarchical Voronoi tessellations for a given data set and overlay heatmaps for variables at various levels of the tessellations for in-depth data analysis. Credits to Mu Sigma for their continuous support throughout the development of the package.
https://cran.r-project.org/web/packages/muHVT/index.html
Other
15 stars 5 forks source link

IMPORTANT NOTICE: This repository has been archived. All future updates and releases will be made available in Mu-Sigma/HVT repository. Similarly, CRAN package muHVT has been discontinued and all the future releases will be made available on HVT package.

muHVT: Collection of functions used to build hierarchical topology preserving maps

Zubin Dowlaty, Shubhra Prakash, Sangeet Moy Das, Shantanu Vaidya, Praditi Shah, Srinivasan Sudarsanam, Somya Shambhawi

2023-06-07

* [1 Abstract](#abstract) * [2 Version History](#version-history) * [2.1 muHVT (v23.06.07) | What’s New?](#muhvt-(v23.06.07)-whats-new) * [2.2 muHVT (v22.12.06)](#muhvt-(v22.12.06)) * [3 Installation of muHVT (v23.06.07)](#installation-of-muhvt-(v23.06.07)) * [4 Vignettes](#vignettes) * [4.1 muHVT Vignette](#muhvt-vignette) * [4.2 muHVT Model Diagnostics Vignette](#muhvt-model-diagnostics-vignette) * [4.3 muHVT : Predicting Cells with Layers using predictLayerHVT ](#muhvt---predicting-cells-with-layers-using-predictLayerHVT)
# 1 Abstract The muHVT package is a collection of R functions to facilitate building [topology preserving maps](https://users.ics.aalto.fi/jhollmen/dippa/node9.html#:~:text=The%20property%20of%20topology%20preserving,tool%20of%20high%2Ddimensional%20data) for rich multivariate data analysis, see `Figure 1` as an example of a 2D torus map generated from the package. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below: 1. **Data Compression**: Vector quantization (VQ), HVQ (hierarchical vector quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective. 2. **Data Projection**: Dimension projection of the compressed cells to 1D,2D or 3D with the Sammons Non-linear Algorithm. This step creates topology preserving map (also called an [embedding](https://en.wikipedia.org/wiki/Embedding)) coordinates into the desired output dimension. 3. **Tessellation**: Create cells required for object visualization using the Voronoi Tessellation method, package includes heatmap plots for hierarchical Voronoi tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map useful for semi-supervised tasks. 4. **Prediction**: Scoring new data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required. The muHVT package allows creation of visually stunning tessellations, showcasing the power of topology preserving maps. Below is an image depicting a captivating tessellation of a torus, see [vignette](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_vignette.html) for more details.

Figure 1: The Voronoi tessellation for layer 1 and number of cells 900 with the heat map overlaid for variable z.

# 2 Version History
## 2.1 muHVT (v23.06.07) | What’s New? 07th June, 2023 In this version of muHVT package, the following new features have been introduced: This package provides functionality to predict cells with layers based on a sequence of maps using `predictLayerHVT`.
## 2.2 muHVT (v22.12.06) 06th December, 2022 This package provides functionality to predict based on a sequence of maps. The creation of a predictive set of maps involves three steps - 1. **Compress:** Compress the dataset using a percentage compression rate and a quantization threshold using the HVT() function (Map A). 2. **Remove novelty cells:** Manually identify and remove the novelty cells from the dataset using the removeNovelty() function (Map B). 3. **Compress the dataset without novelty:** Again, compress the dataset without novelty using n_cells, depth and a quantization threshold using the HVT() function (Map C). Let us try to understand the steps with the help of the diagram below -

Figure 2: Flow diagram for predicting based on a sequence of maps using predictLayerHVT()

# 3 Installation of muHVT (v23.06.07)
library(devtools) devtools::install_github(repo = "Mu-Sigma/muHVT")
# 4 Vignettes Following are the links to the vignettes for the muHVT package:
## 4.1 muHVT Vignette [**muHVT Vignette:**](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_vignette.html) Contains descriptions of the functions used for vector quantization and construction of hierarchical voronoi tessellations for data analysis.
## 4.2 muHVT Model Diagnostics Vignette [**muHVT Model Diagnostics Vignette:**](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/muHVT_model_diagnostics_vignette.html) Contains descriptions of functions used to perform model diagnostics and validation for muHVT model.
## 4.3 muHVT - Predicting Cells with Layers using predictLayerHVT [**muHVT : Predicting Cells with Layers using predictLayerHVT :**](https://htmlpreview.github.io/?https://github.com/Somya545/muHVT/blob/master/vignettes/Predicting_Cells_with_Layers_using_predictLayerHVT.html) Contains descriptions of the functions used for predicting cells with layers based on a sequence of maps using predictLayerHVT.