genomicepidemiology / ARGprofiler

A pipeline for for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets
Apache License 2.0
17 stars 3 forks source link
argprofiler

ARGprofiler

A tool for large-scale analysis of antimicrobial resistance genes (ARGs) and their flanking regions in metagenomic datasets.

ARGprofiler pipeline

Introduction

ARGprofiler is a newly developed Snakemake pipeline designed to analyze ARGs' read distances, abundances, and genomic flanking regions in metagenomic sequencing data. It has been adapted to work for short-read sequencing datasets. The pipeline also includes the recently made PanRes database, a combined collection of current ARG databases, and ARGextender, an assembly tool for extending the genomic flanking region around genes of interest.

ARGprofiler uses the following tools:

The workflow is described in

Martiny, H. M., Pyrounakis, N., Petersen, T. N., Lukjančenko, O., Aarestrup, F. M., Clausen, P. T., & Munk, P. (2024). ARGprofiler—a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets. Bioinformatics, 40(3), btae086. https://doi.org/10.1093/bioinformatics/btae086

Installation

The best way to install the ARGprofiler pipeline is to clone this GitHub repository. The pipeline uses the Conda package manager to deploy the defined software packages in the specified version without requiring admin or root privileges.

git clone https://github.com/genomicepidemiology/ARGprofiler.git

This command will create the ARGprofiler directory in the current directory.

Since ARGprofiler is a Snakemake pipeline, the user should install Snakemake workflow management following the guide here.

Input

ARGprofiler takes as input a JSON file named input.json in the following format:

{run_accession:{"type":READ_TYPE},"run_accession":{"type":READ_TYPE}}

run_accession is the ENA id for the read sequencing datasets, and READ_TYPE can be either PAIRED or SINGLE.

Example:

{"ERR3593315":{"type":"PAIRED"},"SRR7533096":{"type":"SINGLE"}}

The user can also opt to specify the name of the input file in the Snakefile (with open...).

For instructions on how to analyze unpublished sequencing reads check Tips and Tricks

Running ARGprofiler

The user has the option to run the pipeline either on an HPC or locally. For running on HPC, we provide the option of executing the workflow using environment modules or conda packages.

HPC

The user should specify the preferable option for executing the pipeline in the config file. If wanting to use a conda environment, keep use-conda:True; otherwise, replace with use-envmodules:True.

To run ARGprofiler on an HPC with a queuing system, the user should execute the following command:

snakemake --profile profile_argprofiler

Locally

While we have designed ARGprofiler to run in an HPC environment (specifically Computerome), it is possible to run the pipeline locally. Therefore, we recommend creating a mamba environment as follows:

mamba env create --name argprofiler --file rules/environment_argprofiler.yaml

Since we are not executing ARGprofiler in HPC, the user should remove the following flag from the config file: cluster, cluster-config and add the following flag: cores (The cores flag should be changed to reflect the number of cores for Snakemake to use).

Then activate the environment and run Snakemake:

mamba activate argprofiler
snakemake --profile profile_argprofiler

Output

When successfully executed, ARGprofiler creates a directory named results, where the user can find all the available results from all the analysis steps (results are separated into single and paired-reads results). More specifically:

Tips and Tricks

Citation

Martiny, H. M., Pyrounakis, N., Petersen, T. N., Lukjančenko, O., Aarestrup, F. M., Clausen, P. T., & Munk, P. (2024). ARGprofiler—a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets. Bioinformatics, btae086. https://doi.org/10.1093/bioinformatics/btae086

Accompanying code and data for ARGprofiler publication

Code

Data

Feedback and issues

We welcome any comments, bug reports, and suggestions, as they will help us improve ARGprofiler. You can leave comments and bug reports in the repository issue tracker or reach out by e-mail to nipy@food.dtu.dk or hanmar@food.dtu.dk