Sebastien-Raguideau / Metahood

Snakemake Assembly pipeline
10 stars 3 forks source link

Metahood

Metahood is a pipeline entirely based on snakemake, aimed at general analysis on metagenomic shrots reads. It allows to easily assemble, annotate and bin your samples.

What the pipeline does :

How to install Metahood:

We propose installation as the creation of a conda environment where all further call to Metahood will need to be carried out.

An exhaustive list of all dependencies can be found at conda_env.yaml For speed up reason we strongly advice on using mamba instead of conda to solve the environment. To install mamba:

conda install mamba -n base -c conda-forge

Creation of environment can be done following:

cd path_to_repos/Metahood
mamba env create -f conda_envs/conda_env.yaml

You then need to activate the corresponding environment using :

conda activate MetaHood

Fix CONCOCT install Unfortunately a bug still exist in the current conda package for concoct, the following command fix an issue with pandas and an issue with a missing argument :

CPATH=`which concoct_refine`
sed -i 's/values/to_numpy/g' $CPATH
sed -i 's/as_matrix/to_numpy/g' $CPATH
sed -i 's/int(NK), args.seed, args.threads)/ int(NK), args.seed, args.threads, 500)/g' $CPATH

Databases We rely on Checkm hmm for MAG quality assesment: Please download: https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz

How to run Metahood:

conda activate MetaHood
path_to_repos/Metahood/Metahood.py <config file> --cores <nb threads> -s <snakemake options> 

Configuration file

The apparent lack of parameters is deceiving as all the complexity is hidden in a configuration file.
config.yaml

This config file is in the yaml format and indentation is critical. Be mindful of indentation!

------ Resssources ------

------ Output folder ------

------ Path to data folder ------

------ Samples preprocessing ------

------ Assembly parameters ------

NOTE that if neither per_sample nor groups is informed, no task will be carried.

------ Binning parameters------

Output Directory structure:

Example Dataset:

Synthetic community as well as config file are available at :

wget  http://seb.s3.climb.ac.uk/Synth_G45_S03D.tar.gz

After uncompressing, you'll find 2 config file example, one for coassembly, the other (SSA) for Single Sample Assembly. In both you'll need to replace respectively "path_to_folder" by the location of uncompressed folder.

Metahood.py --config <config file> --cores <nb threads> -s <snakemake options>