WatsonLab / MAGpy

Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie)
62 stars 23 forks source link

MAGpy

MAGpy is a Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie)

Citation

Robert Stewart, Marc Auffret, Tim Snelling, Rainer Roehe, Mick Watson (2018) MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs). Bioinformatics bty905, bty905

Clean your MAGs

There are a few things you will need to do before you run MAGpy, and these are due to limitations imposed by the software MAGpy runs, rather than by MAGpy itself.

These are:

NEW RELEASE - June 2021

Install conda

Skip if you already have it. Instructions are here

Clone the repo

git clone https://github.com/WatsonLab/MAGpy.git
cd MAGpy

Install Snakemake and mamba

Skip if you already have them

conda env create -f envs/install.yaml 
conda activate magpy_install

Run tests and install conda envs:

snakemake -rp -s MAGpy --cores 1 --use-conda test

Build the databases

This will build a DIAMOND database of the whole of UniProt TREMBL, so you will need to give it a lot of resources (RAM) - try 256Gb.

rm -rf magpy_dbs
snakemake -rp -s MAGpy --cores 16 --use-conda setup

Run MAGpy

snakemake -rp -s MAGpy --use-conda --cores 8

For large workflows, I recommend you use cluster or cloud execution.

Also, for any large number of MAGs, PhyloPhlAn will take a long time - e.g. a few weeks for a couple of thousand MAGs.