BimberLab / cellhashR

An R package designed to demultiplex cell hashing data
27 stars 6 forks source link

R Build and Checks DOI

cellhashR

An R package designed to demultiplex cell hashing data. Please see our documentation for more detail.

Table of Contents

Overview

Cell hashing is a method that allows sample multiplexing or super-loading within single-cell RNA-seq platforms, such as 10x genomics, originally developed at New York Genome Center in collaboration with the Satija lab. See here for more detail on the technique. The general idea is that cells are labeled with a staining reagent (such as an antibody) tagged with a short nucleotide barcode. Other staining methods have been published, such as the lipid-based Multi-Seq (https://www.ncbi.nlm.nih.gov/pubmed/31209384). In all methods, the hashtag oligo/barcode is sequenced in parallel with cellular mRNA, creating a separate cell hashing library. After sequencing, the cell barcode and hashing index are parsed using tools like Cite-seq-Count (https://github.com/Hoohm/CITE-seq-Count), creating a count matrix with the total hash tag counts per cell.

Once the count matrix is created, an algorithm must be used to demultiplex cells and assign them to hash tags (i.e. sample). This is where cellhashR comes in. This package provides several functions:

Each step of the workflow can either be run interactively in R (through the terminal or RStudio), or it can be executed as a pipeline that runs all commands and creates the call table and an HTML report.

Click here to view an example QC report

Consensus Calling

In addition to allowing one to run multiple demuliplexing algorithms to compare results, cellhashR can generate a consensus call based on those scores. This can be useful, since some algorithms will perform better or worse under some conditions. This is automatically built into the dataframe returned by GenerateCellHashingCalls(). Some additional parameters that might be worth considering are:

Example Usage

Below are the primary functions of cellhashR needed to QC and score hashing data:

# Example 1: parse CITE-seq-Count output, printing QC
barcodeData <- ProcessCountMatrix(rawCountData = 'myCountDir/umi_count', minCountPerCell = 5)

# Example 2: parse CITE-seq-Count output, providing a barcode whitelist. 
barcodeData <- ProcessCountMatrix(rawCountData = 'myCountDir/umi_count', minCountPerCell = 5, barcodeWhitelist = c('HTO-1', 'HTO-2', 'HTO-3', 'HTO-4', 'HTO-6'))

# Create QC plots of barcode normalization
PlotNormalizationQC(barcodeData)

# Generate the final cell hashing calls
calls <- GenerateCellHashingCalls(barcodeMatrix = barcodeData, methods = c('multiseq', 'htodemux'))

# Inspect negative cells:
SummarizeCellsByClassification(calls = calls, barcodeMatrix = barcodeData)

Or export/save a template RMarkdown file outlining the default workflow, which can be run interactively or headlessly as part of a pipeline:

GetExampleMarkdown(dest = 'cellhashR_template.rmd')

Finally, the workflow can be executed using this wrapper around the Rmarkdown, producing a TSV of calls and HTML QC report:

CallAndGenerateReport(rawCountData = 'myCountDir/umi_count', reportFile = 'report.html', callFile = 'calls.txt', barcodeWhitelist = c('HTO-1', 'HTO-2', 'HTO-3'), title = 'Cell Hashing For Experiment 1')

Installation

# Make sure to update your Rprofile to include Bioconductor repos, such as adding this line to ~/.Rprofile:
local({options(repos = BiocManager::repositories())})

#Latest version:
devtools::install_github(repo = 'bimberlab/cellhashR', ref = 'master', dependencies = TRUE, upgrade = 'always')

Pre-packaged Docker images with all needed dependencies installed can be found on our GitHub Packages page. We recommend using a specific release, which you can do using tags:

docker pull ghcr.io/bimberlab/cellhashr:latest

Known Issues

If you receive an error along the lines of:

"ERROR; return code from pthread_create() is 22\n"

Please manually install preprocessCore with threading disabled:

devtools::install_github('bmbolstad/preprocessCore', dependencies = T, upgrade = 'always', configure.args = '--disable-threading')

Providing h5 file to demuxEM/demuxmix

Unlike the other algorithms, which just require the HTO count matrix, demuxEM and demuxmix also require the path to the 10x h5 gene expression counts. This can be supplied as follows. This example runs BFF and demuxEM:

  rawData <- '../testdata/438-21-GEX/umi_count'
  h5File <- '../testdata/438-21-GEX/438-21-raw_feature_bc_matrix.h5'
  barcodeMatrix <- ProcessCountMatrix(rawCountData = rawData, barcodeWhitelist = c('MS-11', 'MS-12'))
  df <- GenerateCellHashingCalls(barcodeMatrix = barcodeMatrix, methods = c('bff_cluster', 'demuxem'), rawFeatureMatrixH5 = h5File)

Development Guidelines