MAGpy is a Snakemake pipeline for downstream analysis of metagenome-assembled genomes (MAGs) (pronounced mag-pie)
Robert Stewart, Marc Auffret, Tim Snelling, Rainer Roehe, Mick Watson (2018) MAGpy: a reproducible pipeline for the downstream analysis of metagenome-assembled genomes (MAGs). Bioinformatics bty905, bty905
There are a few things you will need to do before you run MAGpy, and these are due to limitations imposed by the software MAGpy runs, rather than by MAGpy itself.
These are:
Skip if you already have it. Instructions are here
git clone https://github.com/WatsonLab/MAGpy.git
cd MAGpy
Skip if you already have them
conda env create -f envs/install.yaml
conda activate magpy_install
snakemake -rp -s MAGpy --cores 1 --use-conda test
This will build a DIAMOND database of the whole of UniProt TREMBL, so you will need to give it a lot of resources (RAM) - try 256Gb.
rm -rf magpy_dbs
snakemake -rp -s MAGpy --cores 16 --use-conda setup
snakemake -rp -s MAGpy --use-conda --cores 8
For large workflows, I recommend you use cluster or cloud execution.
Also, for any large number of MAGs, PhyloPhlAn will take a long time - e.g. a few weeks for a couple of thousand MAGs.