This project is no longer under active maintenance. You're welcome to use it, but no updates or bug fixes will be posted. We recommend using Nextflow together with nf-core instead.
Many thanks to everyone who used and supported Cluster Flow over the years.
Find Cluster Flow documentation with information and examples at https://ewels.github.io/clusterflow/
Cluster Flow is a pipelining tool to automate and standardise bioinformatics analyses on high-performance cluster environments. It is designed to be easy to use, quick to set up and flexible to configure.
Cluster Flow is written in Perl and works by launching jobs to a cluster (can also be run locally). Each job is a stand-alone Perl executable wrapper around a bioinformatics tool of interest.
Modules collect extensive logging information and Cluster Flow e-mails the user with a summary of the pipeline commands and exit codes upon completion.
You can find stable versions to download on the releases page.
You can get the development version of the code by cloning this repository:
git clone https://github.com/ewels/clusterflow.git
Once downloaded and extracted, create a clusterflow.config
file in the
script directory, based on clusterflow.config.example
.
Next, you need to add the main cf
executable to your PATH
. This can be done
as an environment module, with a symlink to bin
or by adding to your ~/.bashrc
file.
Finally, run the setup wizard (cf --setup
) and genomes wizard (cf --add_genome
) and
you're ready to go! See the installation docs for more
information.
Pipelines are launched by naming a pipeline or module and the input files. A simple example could look like this:
cf sra_trim *.fastq.gz
Most pipelines need reference genomes, and Cluster Flow has built in reference genome management. Parameters can be passed to modify tool behaviour.
For example, to run the fastq_bowtie
pipeline (FastQC, TrimGalore! and Bowtie)
with Human data, trimming the first 6bp of read 1, the command would be:
cf --genome GRCh37 --params "clip_r1=6" fastq_bowtie *.fastq.gz
Additional common Cluster Flow commands are as follows:
cf --genomes # List available reference genomes
cf --pipelines # List available pipelines
cf --modules # List available modules
cf --qstat # List running pipelines
cf --qdel [id] # Cancel jobs for a running pipeline
Cluster Flow comes with modules and pipelines for the following tools:
Read QC & pre-processing | Aligners / quantifiers | Post-alignment processing | Post-alignment QC |
---|---|---|---|
FastQ Screen | Bismark | bedtools (bamToBed , intersectNeg ) |
deepTools (bamCoverage , bamFingerprint ) |
FastQC | Bowtie 1 | subread featureCounts | MultiQC |
TrimGalore! | Bowtie 2 | HTSeq Count | phantompeaktools (runSpp ) |
SRA Toolkit | BWA | Picard (MarkDuplicates ) |
Preseq |
HiCUP | Samtools (bam2sam , dedup , sort_index ) |
RSeQC (geneBody_coverage , inner_distance , junction_annotation , junction_saturation , read_GC ) |
|
HISAT2 | |||
Kallisto | |||
STAR | |||
TopHat |
Please consider citing Cluster Flow if you use it in your analysis.
Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved].
Philip Ewels, Felix Krueger, Max Käller, Simon Andrews
F1000Research 2016, 5:2824
doi: 10.12688/f1000research.10335.2
@article{Ewels2016,
author = {Ewels, Philip and Krueger, Felix and K{\"{a}}ller, Max and Andrews, Simon},
title = {Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved].},
journal = {F1000Research},
volume = {5},
pages = {2824},
year = {2016},
doi = {10.12688/f1000research.10335.2},
URL = { + http://dx.doi.org/10.12688/f1000research.10335.2}
}
Contributions and suggestions for new features are welcome, as are bug reports! Please create a new issue. Cluster Flow has extensive documentation describing how to write new modules and pipelines.
There is a chat room for the package hosted on Gitter where you can discuss things with the package author and other developers: https://gitter.im/ewels/clusterflow
If in doubt, feel free to get in touch with the author directly: @ewels (phil.ewels@scilifelab.se)
Project lead and main author: @ewels
Code contributions from: @s-andrews, @FelixKrueger, @stu2, @orzechoj @darogan and others. Thanks for your support!
Cluster Flow is released with a GPL v3 licence. Cluster Flow is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. For more information, see the licence that comes bundled with Cluster Flow.