This pipeline was developed using the Snakemake workflow management system
You would need to have the Snakefile, the env
folder and its contents (YAML files with environment definition), and a table with the absolute paths for forward and reverse reads files specified in config.yaml
.
To run in your computer
snakemake --use-conda
To run in a High Performance Computing cluster with the SGE job scheduler:
snakemake --cluster "qsub -V -cwd -pe smp {threads}" --use-conda -j <# of jobs>
The following attributes can be changed/specified in the config.yaml
file:
results
) This workflow requires the Conda package manager, which handles the installation of tools and their dependencies.
This workflow is written in, and therefore requires, Snakemake, which can be installed using Conda. Once Conda is installed, the following command will create a Conda environment with Snakemake (and an additional dependency, Mamba):
conda create -n <env-name> -c bioconda -c conda-forge snakemake mamba
replacing <env-name>
with a name of your choice.
An optional portion of the workflow will perform rarefaction on kraken2 output using a tool called Krakefaction, producing taxa discovery rate tables. You can set the perform_rarefaction
flag in the config file. In order to perform this subworkflow you must perform the following to install Krakefaction:
.fastq.gz
)config.yaml
.
.tab
file (eg. samples_new.tab
) that contains all the filenames of the read filesgit clone https://github.com/BeeCSI-Microbiome/taxonomic_profiling_pipeline.git
.tab
file (containing the sample names) is specified in the config.yaml
filesamples_new.tab
to point to the raw data files (eg. add ../
before all the file names), or copy all the contents of the repository to the same folder where the samples are, eg. cp -r taxonomic_profiling_pipeline/* .
conda activate Snakemake
snakemake –nr
snakemake --cluster "qsub -V -cwd -pe smp {threads}" --use-conda -j <number_of_jobs> [--latency-wait <seconds>]
<number_of_jobs>
with the number of .fastq.gz
files divided by 2--latency-wait
is optional. Rules sometimes raise a false error in which it says the output file has not produced when it actually has. A wait of 60s has prevented this error.wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
<Enter>
to continue q
to close the license or <Enter>
to scroll through it until you've read it all yes
to accept the license, then presse <Enter>
no
and press <Enter>
source ~/miniconda3/etc/profile.d/conda.sh
conda
command for your environment. You will likely want to run this command every time you connect to the Biocluster, so you are encouraged to edit your ~/.bashrc file
, and add that command to the end of the file. conda update conda
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
1) Check that the file is formatted correctly: <Sample name><tab><sample-R1 filepath><tab><sample-R2 filepath>
- See `samples.tab` for an example.
- Note that some editors will replace tabs with spaces which may be the cause of this error.
1) Below is one of several ways to add Krakefaction to your PATH.
a) Install Krakefaction according to the guidelines above.
b) Find your `.bashrc` file (it should be in your home directory or contact your IT department if you can't find it).
c) Add or append the following in parentheses to your `.bashrc` file (`export PATH="<absolute path to the directory in which krakefaction was installed>/krakefaction/bin:$PATH"`)