Ribo-DT is a snakemake pipeline to infer single-codon and codon-pair dwell times as well as gene flux from ribosome profiling data using generalized linear model (GLM) with negative binomial noise (Gobet et al., PNAS, 2020).
Ribo-DT consists of a Snakefile
, a conda
environment file (Ribo_DT.yaml
), a configuration file (config.yaml
) and a set of R
and perl
scripts to infer ribosome dwell times and gene flux from raw ribosome profiling data.
ENSEMBL
according to the species defined in the configuration file (config.yaml
). STAR
.glm4
function.Note that when RNA-seq and Ribo-seq are provided for the same sample, RNA-Seq is fitted first and used as an GLM offset in the Ribo-seq fit to reduce library preparation bias.
Prerequisites
The current workflow is designed to run on high computing cluster with slurm
workload manager but running_command.sh
can be modified to match your cluster configuration. All the required packages and softwares are installed via the conda
environment file (Ribo_DT.yaml
).
Clone the github repository and change directory
Clone this git repository to the location where you want to run your analysis
git clone https://github.com/cgob/codonDT_snakemake.git
cd codonDT_snakemake
Download and install conda (select an appropriate version for your operating system)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Restart your terminal
Activate conda environment
This step install snakemake, all the required softwares, librairies, and dependencies in a conda
environment to run the workflow.
conda env create -f Ribo_DT.yaml
conda activate Ribo_DT
Sample file
Edit the sample file (samples.tsv
or your own file).
GEO
accession number of the study of interest. Click on run selector, select your samples of interest and download the metadata text file. This file can be edited to meet the sample file requirements.Configuration file
Edit the configuration file (config.yaml
). Set:
workdir
with the path of the output and working files. homedir
with the path of the Snakefile directory. refdir
with the path of the reference genome directory.species
with the proper species for your dataset (Mouse, Human, Yeast). adapter
with the 3'-adapter used during library preparation of your dataset.L1
with the read size lower bound ( L1
> reads > L2
are kept for the fit). L2
with the read size upper bound. library
with 'pos_neg' or 'neg_pos' depending on the strandness configuration of your library preparation. samples
with the tab-delimited file defined above describing your samples.filter_1
with filter threshold for the minimum number of reads per gene.filter_2
with p-value threshold for dwell times in the heatmap representation.5p
or 3p
defining which read ends to use to compute A site offsets from the pile-up densities at the start codons.Running the pipeline
bash running_command.sh
Running the pipeline on a personal computer
snakemake --cores N
with N the number of CPU cores to be used at the same time.
Cédric Gobet