KechrisLab / MSPrep

A processing pipeline for the summarization, normalization and diagnostics of mass spectrometry–based metabolomics data.
10 stars 3 forks source link

MSPrep

Introduction

MSPrep provides a convenient set of functionalities used in the pre-analytic processing pipeline for mass spectrometry based metabolomics data. Functions are included for the following processes commonly performed prior to analysis of such data:

  1. Summarization of technical replicates (if available)
  2. Filtering of metabolites
  3. Imputation of missing values
  4. Transformation, normalization, and batch correction

Original manuscript published in Bioinformatics, and package is hosted by Bioconductor.

Additional helpful links:

  1. Vignette providing detailed instructions with examples
  2. Reference Manual describing function usage

Installation

Install via Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("MSPrep")

Install via Github:

if (!require("devtools")) install.packages("devtools")
devtools::install_github("KechrisLab/MSPrep")

Examples

Two examples are provided below. For more detailed information see the package Vignette which can be accessed via Bioconductor or by using the following R command following package installation:

vignette("using_MSPrep", package = "MSPrep")

The following code loads the example data set, MSQuant, summarizes its technical replicates, filters metabolites by only keeping those which are present in 80% of samples, imputes missing values using k-nearest neighbors, applies a log base ten transformation, and finally normalizes and batch corrects the data set using quantile normalization and ComBat batch correction. Data is then returned as a data.frame.

library(MSPrep)
data(msquant)

preparedDF <- msPrepare(msquant,
                        minPropPresent = 1/3,
                        missingValue = 1,
                        filterPercent = 0.8,
                        imputeMethod = "knn",
                        transform = "log10",
                        normalizeMethod = "quantile + ComBat",
                        covariatesOfInterest = c("spike"),
                        compVars = c("mz", "rt"),
                        sampleVars = c("spike", "batch", "replicate",
                                       "subject_id"),
                        colExtraText = "Neutral_Operator_Dif_Pos_",
                        separator = "_")

The second example uses the data set COPD_131. The raw data set can be found here, at Metabolomics Workbench.. The code loads the data set, summarizes its technical replicates, filters metabolites by only keeping those which are present in 80% of samples, imputes missing values using BPCA imputation, and finally normalizes the data set using median normalization. Data is then returned as a SummarizedExperiment by setting the argument returnToSE = TRUE.

library(MSPrep)
data(COPD_131)

preparedSE <- msPrepare(COPD_131,
                        minPropPresent = 1/3,
                        filterPercent = 0.8,
                        missingValue = 0,
                        imputeMethod = "bpca",
                        nPcs = 3,
                        normalizeMethod = "median",
                        transform = "none",
                        compVars = c("Mass", "Retention.Time",
                                     "Compound.Name"),
                        sampleVars = c("subject_id", "replicate"),
                        colExtraText = "X",
                        separator = "_",
                        returnToSE = TRUE)

Bug Reports

Report bugs as issues on the GitHub repository new issue