CCBR / CCBR_tobias

Tobias implementation for ATAC seq data.
MIT License
1 stars 1 forks source link

CCBR TOBIAS snakemake pipeline

TOBIAS or "Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal" is a framework of tools for investigating transcription factor binding from ATAC-seq signal. The analysis involves numerous sequential steps (or tasks) to be executed in order to successfully predict TF occupancy footprint from deduplicated alignment BAM files of ATACseq raw data (fastq files). Here we use Snakemake to automate the sequential execution on any HPC. Most tools used by the pipeline are completely containerized in docker format and can be invoked using singularity on the HPC. The minimum requirements for running this pipeline are:

This pipeline was built using the CCBR_SnakemakePipelineCookiecutter.

Please visit the following pages for more details directly from the authors of TOBIAS:

Quick start instructions for running CCBR_tobias on Biowulf

Various version of the pipeline have been checked out at /data/CCBR_Pipeliner/Pipelines/CCBR_tobias on biowulf. You can get help about running the pipeline using:

%  bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2/run_tobias.bash --help
Pipeline Dir: /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2
Git Commit/Tag: 6c8726023269ace0fd8fe886a1213859b363f9fd    v0.2
/data/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2/run_tobias.bash: run CCBR TOBIAS workflow for ATAC seq data
USAGE:
  bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/v0.2/run_tobias.bash -m/--runmode=<MODE> -w/--workdir=<path_to_workdir>
Required Arguments:
1.  RUNMODE: [Type: String] Valid options:
    *) init : initialize workdir
    *) run : run with slurm
    *) reset : DELETE workdir dir and re-init it
    *) dryrun : dry run snakemake to generate DAG
    *) unlock : unlock workdir if locked by snakemake
    *) runlocal : run without submitting to sbatch
2.  WORKDIR: [Type: String]: Absolute or relative path to the output folder with write permissions.

The pipeline requires only 2 arguments:

Generally, we anticipate CCBR_tobais to be run in 3 steps:

1. Initialize

% bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/dev/run_tobias.bash -m=init -w=/path/to/outfolder

This creates the output folder, so it should not exists before running init. Along with other scripts and files, init copies config.yaml and cluster.json to the output folder, which can then be edited by the user. Some key input values that need to be edited before running the pipeline are as follows:

database organism version
HOCOMOCO_v11 Human Core
HOCOMOCO_v11 Human Full
HOCOMOCO_v11 Mouse Core
HOCOMOCO_v11 Mouse Full
HOCOMOCO_v11 Human+Mouse Core
HOCOMOCO_v11 Human+Mouse Full
JASPAR2020 - core_nonredundant
JASPAR2020 - core_redundant
JASPAR2020 vertebrate core_nonredundant
JASPAR2020 vertebrate core_redundant

2. Dryrun

% bash /data/CCBR_Pipeliner/Pipelines/CCBR_tobias/dev/run_tobias.bash -m=dryrun -w=/path/to/outfolder

Running the above command ensures that

3. Run

After successfully running dryrun , the user can run the same command with -m=run option to submit jobs to the slurm job scheduler on biowulf. By default, the norm partition is used to running jobs, but that and other job parameters can be changed by editing the cluster.json file in the output folder.

Expected Outputs:

The following folders are expected upon successful completion.

bams

Individual replicate alignment BAMs are merged together and pre-sorted. This folder will contains the merged BAMs

coverage

The merged BAMs are converted to normalized bigwigs for visualization with IGV. The bigwigs can be found here.

bias_correction

The merged BAMs from the bams folder are corrected for Tn5 insertion bias. 4 separate bigwigs are expected as output on a per-condition basis:

footprinting

Using the bias corrected corrected bigwig a per-condition footprinting bigwig is created limited to the "regions of interest" defined by the peaks in the config.yaml.

peaks

Supplied peaks are annotated using UROPA and annotations are stored here.

TFBS_{contrast}

One TFBS folder is create for each contrast. There are created by running bindetect. Each TFBS folder contains numerous (100s) subfolders, one for each motif in the motif database selected using motifs parameter in config.yaml. Each of these per-TF-motif subfolder also has a standard folder structure including a subfolder name beds. This contains:

More more details see https://github.com/loosolab/TOBIAS/wiki/BINDetect

Caution This folder has a large digital footprint. Approximately, each contrast produces files amounting to about 40-60 GB. Hence, only run those contrasts that are interesting. DO NOT RUN ALL JUST BECAUSE YOU CAN!

This folder also contains:

which are the key results for this contrast as a table and as plots.

overview_{contrast}

All "bound" bed for all the TF motifs considered are concatenated together to be reported here as 2 sorted and indexed bed files. As these are indexed they can be easily loaded in a IGV session for visual inspection.

network

TF-TF binding networks are created with TOBIAS CreateNetwork for the first condition in each contrast. An adjacency matrix and a list of edges are reported individually for each TF motif and summarized overall for each network.