Guide: Running the Single Cell RNA-seq Pipeline using Toil

This guide attempts to walk the user through running this pipeline from start to finish. If there are any questions please contact John Vivian ( If you find any errors or corrections please feel free to make a pull request. Feedback of any kind is appreciated.


RNA-seq fastqs generated from 10x Chromium single-cell experiments are quantified to produce a gene by cell matrix. Additional QC plots are generated

This pipeline produces a tarball (tar.gz) file for a given sample that contains n subdirectories:

The output tarball is prepended with the UUID for the sample (e.g. UUID.tar.gz).


This pipeline has been tested on Ubuntu 14.04, but should also run on other unix based systems. apt-get and pip often require sudo privilege, so if the below commands fail, try prepending sudo. If you do not have sudo privileges you will need to build these tools from source, or bug a sysadmin about how to get them (they don't mind).

General Dependencies

1. Python 2.7
2. Curl         apt-get install curl
3. Docker

Python Dependencies

1. Toil         pip install toil
2. S3AM         pip install --pre s3am (optional, needed for uploading output to S3)

The CGL RNA-seq pipeline requires an index file in order to run. This file is hosted on Synapse and can be downloaded after creating an account which takes about 1 minute and is free.

All samples and inputs must be submitted as URLs with support for the following schemas: http://, file://, s3://, ftp://.

Samples consisting of tarballs with fastq files inside must follow the file name convention of ending in an R1/R2 or _1/_2 followed by .fastq.gz, .fastq, .fq.gz or .fq..

General Usage

Type toil-rnaseq to get basic help menu and instructions

  1. Type toil-rnaseq-sc generate to create an editable manifest and config in the current working directory.
  2. Parameterize the pipeline by editing the config.
  3. Fill in the manifest with information pertaining to your samples.
  4. Type toil-rnaseq-sc run [jobStore] to execute the pipeline.

Example Commands

Run sample(s) locally using the manifest

  1. toil-rnaseq-sc generate
  2. Fill in config and manifest
  3. toil-rnaseq-sc run ./example-jobstore

Toil options can be appended to toil-rnaseq run, for example: toil-rnaseq-sc run ./example-jobstore --retryCount=1 --workDir=/data

For a complete list of Toil options, just type toil-rnaseq run -h

Run a variety of samples locally

  1. toil-rnaseq-sc generate-config
  2. Fill in config
  3. toil-rnaseq-sc run ./example-jobstore --retryCount=1 --workDir=/data --samples \ s3://example-bucket/sample_1.tar file:///full/path/to/sample_2.tar

Example Config

kallisto-index: s3://cgl-pipeline-inputs/rnaseq_cgl/kallisto_hg38.idx
output-dir: /data/my-toil-run

Distributed Run

Reference Data

