Thanks to the increased cost-effectiveness of high-throughput technologies, the number of studies focusing on microorganisms (bacteria, archaea, microbial eukaryotes, fungi, and viruses) and their connections with human health and diseases has surged, and, consequently, a plethora of approaches and software has been made available for their study, making it difficult to select the best methods and tools.
Here we present Yet Another Metagenomic Pipeline (YAMP) that, starting from the raw sequencing data and having a strong focus on quality control, allows, within hours, the data processing up to the functional annotation (please refer to the YAMP wiki for more information).
YAMP is constructed on Nextflow, a framework based on the dataflow programming model, which allows writing workflows that are highly parallel, easily portable (including on distributed systems), and very flexible and customisable, characteristics which have been inherited by YAMP. New modules can be added easily and the existing ones can be customised -- even though we have already provided default parameters deriving from our own experience.
YAMP is accompanied by a set of (customisable) containers, that saves the users from the hassle of installing the required software, increasing, at the same time, the reproducibility of the YAMP results (see Using Docker or Singularity).
Please cite YAMP as:
Visconti A., Martin T.C., and Falchi M., "YAMP: a containerised workflow enabling reproducibility in metagenomics research", GigaScience (2018), https://doi.org/10.1093/gigascience/giy072
To run YAMP you will need to install Nextflow (version 20.10 or higher), as explained here. Please note that Nextflow requires BASH and Java 7+. Both should be already available in most of the POSIX compatible systems (Linux, Solaris, OS X, etc).
If you are using the containerised version of YAMP (as we strongly suggest), you should also install Docker or Singularity, as explained here and here, respectively.
Once you have either Docker or Singularity up and running, you will not need to install anything additional tools. All the pieces of software are already specified in the YAMP pipeline and will be downloaded during the first run. Please refer to Using Docker or Singularity for more details.
For expert users only. If you do not want to use the containerised version of YAMP, you must install the following pieces of software:
All of them should be in the system path with execute and read permission.
Following the links, you will find detailed instructions on how to install them, as explained by their developers. Notably, many of these tools are also available in bioconda.
Clone the YAMP repository in a directory of your choice:
git clone https://github.com/alesssia/YAMP.git
A detailed description of what is included in the repository is available at this wiki page, while a description of the pre-set config
files (used in our tutorials) is available this wiki page,
YAMP requires a set of databases that are queried during its execution. Some of them are already available with YAMP, others should be automatically downloaded either the first time you use the tool (MetaPhlAn), or using specialised scripts (HUMAnN), or should be created by the user. Specifically, you will need:
./assests/data/adapters.fa
), but please note that this file may need to be customised../assests/data/sequencing_artifacts.fa.gz
and ./assests/data/phix174_ill.ref.fa.gz
), but please note that both may need to be customised.More details on these files and how to use and get them are available on our wiki. Please read it carefully before using YAMP.
The simplest way to use YAMP (after having satisfied all the dependencies and requirements) is by using the following command:
nextflow run YAMP.nf --reads1 myfile_R1.fq.gz --reads2 myfile_R2.fq.gz --prefix my_sample
--outdir output_folder --mode complete -profile base,docker
or you can run a test with the following command:
nextflow run YAMP.nf -profile test,docker
More information on the YAMP parameters and running mode are available in the YAMP wiki, where there are also several tutorials.
YAMP takes advantage of a multi-image scenario. This means that each process will specify which container should be used, along with its version (as explained here).
YAMP also provides a docker
and a singularity
profile that can be used to tell Nextflow to enable the use of Docker/Singularity (as explained here), for instance using the following commands:
nextflow run YAMP.nf --reads1 myfile_R1.fq.gz --reads2 myfile_R2.fq.gz --prefix my_sample
--outdir output_folder --mode complete -profile base,docker
nextflow run YAMP.nf --reads1 myfile_R1.fq.gz --reads2 myfile_R2.fq.gz --prefix my_sample
--outdir output_folder --mode complete -profile base,singularity
Please note that Nextflow is not included in the Docker container and should be installed as explained here.
We have listed all known issues and solutions on this wiki page. Please report any issue using the GitHub platform.
Alessia would like to thank:
YAMP is licensed under GNU GPL v3.
Enhancements:
Fixes:
standard
profile as base
Fixes:
Fixes:
nextflow.config
Enhancements:
characterisation
modeFixes:
characterisation
modeFixes:
complete
modeNotes:
Enhancements:
keepQCtmpfile
is trueFixes:
Enhancements:
Enhancements:
nextflow.config
fileEnhancements:
Enhancements: