LUMC / HAMLET

RNA-seq pipeline for AML diagnostics developed in collaboration with the LUMC Hematology department
MIT License
6 stars 4 forks source link

Continuous Integration PEP compatible GitHub release Commits since latest release

Hamlet

Hamlet is a pipeline for analysis of human acute myeloid leukemia RNA-seq samples. Please use the public github repository to open issues or pull requests.

Four distinct analysis modules comprise Hamlet, which can be run independently and have their own documentation:

  1. qc-seq, for adapter trimming and quality control
  2. snv-indels, for small variant detection
  3. fusion, for fusion gene detection
  4. itd, for tandem duplication detection

Everything is tied together by a main Snakefile using modules.

HAMLET is build to use Singularity to run every Snakemake rule inside its own container. The base execution environment for HAMLET defined by an environment.yml file.

In addition to the raw output files, Hamlet also generates a PDF report containing an overview of the essential results and a JSON file containing the underlying data that are shown in the report.

Installation

The dependencies required for running the pipeline are listed in the provided environment.yml file. To use it, first make sure that you have Conda installed on your system. Then, set up a Conda virtual environment and activate it:

# Set up and activate your conda environment.
# Install the dependencies
conda env create -f environment.yml

# Activate the conda environment
conda activate HAMLET

Additionally, singularity version 3 or greater should be installed on the system.

Data files

Automatically generate the required reference files for the HAMLET pipeline in the HAMLET-data folder with

snakemake \
    --snakefile utilities/deps/Snakefile \
    --use-singularity \
    --singularity-args '--cleanenv --bind /tmp' \
    --directory HAMLET-data

Next, you can automatically generate a configuration file with the following helper script

python3 utilities/create-config.py HAMLET-data

Testing

The following commands can be used to test different aspects of HAMLET. If any of the tests fail, you can inspect the log.err and log.out files in the run folder.

Activate the HAMLET conda environment you installed above.

conda activate HAMLET

To test if all dependencies of HAMLET have been installed, use

pytest --kwd --tag sanity

To test if HAMLET can parse the example configurations and find the appropriate output files, use

pytest --kwd --tag dry-run

To test the full behaviour of HAMLET, you can use

pytest --kwd --tag functional

Usage

Input files

HAMLET requires two separate input files. Firstly, a json file that contains the settings and reference files for the pipeline, see above.

Secondly, HAMLET requires a Portable Encapsulated Project configuration that specifies the samples and their associated gzipped, paired-end mRNA-seq files. For simple use cases, this can be a csv file with one line per read-pair, as can be seen here.

Any number of samples can be processed in a single execution, and each sample may have any number of read pairs, and HAMLET will handle those properly.

Note that spaces in sample names are not supported

Execution

If running in a cluster, you may also want to define the resource configurations in another YAML file. Read more about this type of configuration on the official Snakemake documentation. For this file, let's call it config-cluster.yml

Example command

$ snakemake -s Snakefile \
    --configfile config.json \
    --config pepfile=sample_sheet.csv \
    --cluster-config config-cluster.yml \
    --rerun-incomplete \
    --use-singularity \
    --singularity-args ' --containall' \
    # ... other flags

Explanation for the various flags

flag description required
--configfile config.json The configuration file for the pipeline Yes
--cluster-config A cluster configuration file, only relevant when you are running HAMLET on a cluster No
--config pepfile=sample_sheet.csv A PEP configuration file that contains all samples, can be CSV Yes
--rerun-incomplete Re-run jobs if the output appears incomplete No
--use-singularity Use Singularity images to fetch all required dependencies. Yes
--singularity-args Arguments to pass to singularity. Use --bind to specify which folders on your system should be accessible inside the container. This should at least be the folders where your samples and reference files are located Yes

Output files

Assuming the output directory is set to /path/to/output, Hamlet will create /path/to/output/{sample_name} for each sample present in the config file. Inside the directory, there will be a PDF report called hamlet_report.{sample_name}.pdf which contains the overview of the essential results. The same data is also present in the JSON file called {sample_name}.summary.json.

Grouping results from multiple samples

If you analysed multiple samples using HAMLET, you can generate an overview of multiple samples using the utilities/hamlet_table.py script, rather than reading many PDF files. This script uses the {sample_name}.summary.json files which are generated as part of the default HAMLET output.

$ python3 utilities/hamlet_table.py --help
usage: hamlet_table.py [-h] [--itd-gene ITD_GENE] {variant,fusion,itd} json_files [json_files ...]

positional arguments:
  {variant,fusion,itd}  Table to output
  json_files

options:
  -h, --help            show this help message and exit
  --itd-gene ITD_GENE

Notes

  1. You can run Hamlet from anywhere, but preferably this is done outside of the repository. This way, the temporary Snakemake files are written elsewhere and does not pollute the repository.

Citation

If you use HAMLET in your research, please cite the HAMLET publication.

Common issues

Using singularity

If you forget the --use-singularity flag for Snakemake, you will find that many rules break due to the required tools not being available on your system.

Snakemake errors about reserved keywords

If you install Snakemake manually instead of using Conda and the provided environment.yml file, you might get errors about reserved keyword that are used in the Snakefiles. Please use the Snakemake version specified in the environment.yml file.