Version 1.0.0 | Updated August 2017
Author: Carlos Guzman
E-mail: cag104@ucsd.edu
CIPHER is a data processing workflow platform for next generation sequencing data including ChIP-seq, RNA-seq, DNase-seq, MNase-seq, ATAC-seq and GRO-seq. By taking advantage of the Nextflow language, and Singularity containers, CIPHER is an extremely easy to use, and reproducible pre-processing workflow toolkit.
CIPHER has a built in help command. For more information regarding possible parameters and their meanings, open up the command line terminal and type:
nextflow run cipher.nf --help
Download or git clone
this repository and install dependencies.
The only required dependencies to run CIPHER is:
Config files are tab separated text files with 5 columns for single-ended data and 6 columns for pair ended data.
Single-ended CONFIG:
sample1 sample1_rep1 /path/to/fastq.gz control1 sample1
sample2 sample2_rep1 /path/to/fastq.gz control1 input
Pair-ended CONFIG:
sample1 sample1_rep1 /path/to/fastq_R1.gz /path/to/fastq_R2.gz control1 sample1
sample2 sample2_rep1 /path/to/fastq_R1.gz /path/to/fastq_R2.gz control1 input
DO NOT MIX AND MATCH SINGLE AND PAIR ENDED DATA INTO THE SAME CONFIG FILE. CIPHER DOES NOT HANDLE THIS USE-CASE YET.
Where columns refer to:
-
if no input file is available or needed (as is the case in RNA-seq/GRO-seq/MNase-seq/etc.input
if that sample corresponds to an input file. Otherwise use MergeID
.1) Install required dependencies
2) Create Singularity container (will require sudo
access, so a container can be created on a local laptop/desktop and then transferred to the appopriate location/machine/cluster)
```
sudo singularity create -s 8000 cipher.img
```
```
sudo singularity bootstrap cipher.img Singularity
```
3) Run your workflow
```
nextflow run cipher.nf -with-singularity <cipher.img> --mode <MODE> --config <CONFIG> --fa <FASTA> --gtf <GTF> --lib <LIB> --readLen <LENGTH> [options]
```
NOTE: If not running on a cluster please set the -qs <INT>
flag in order to control the number of processes that CIPHER parallelizes. Too many and the workflow will abruptly end because it runs out of memory. nextflow run -qs <INT> cipher.nf ...
NOTE: If you would like to run CIPHER without using Singularity containers, please make sure that you have installed all the required software for your specific pipeline. Tools used can be found inside the main cipher.nf script.
Some example data to test CIPHER's workflows can be found in the example_data
folder. The user should alter the config file fastq paths before running the workflow otherwise the run will fail.
CIPHER is possible to execute it on your computer or any cluster resource manager without modifying it.
Currently the following platforms are supported:
By default the pipeline is parallelized by spanning multiple threads in the machine where the script is launched.
For example, to submit the execution to a SGE cluster edit the file named nextflow.config
, in the directory
where the cipher.nf file is found, with the following content:
process {
executor='sge'
queue='<your queue name>'
}
In doing that, tasks will be executed through the qsub
SGE command, and so your pipeline will behave like any
other SGE job script, with the benefit that Nextflow will automatically and transparently manage the tasks
synchronisation, file(s) staging/un-staging, etc.
More information regarding the platforms Nextflow supports and how to run them can be found HERE.