A tool for large-scale analysis of antimicrobial resistance genes (ARGs) and their flanking regions in metagenomic datasets.
ARGprofiler is a newly developed Snakemake pipeline designed to analyze ARGs' read distances, abundances, and genomic flanking regions in metagenomic sequencing data. It has been adapted to work for short-read sequencing datasets. The pipeline also includes the recently made PanRes database, a combined collection of current ARG databases, and ARGextender, an assembly tool for extending the genomic flanking region around genes of interest.
ARGprofiler uses the following tools:
fastq-dl
for downloading raw reads from ENAfastp
for trimming and QC of raw readsKMA
for alignment of raw reads against reference databasesARGextender
for extracting the genomic flanking regions around ARGsMash
for creating sketches to estimate genetic distancesThe workflow is described in
Martiny, H. M., Pyrounakis, N., Petersen, T. N., Lukjančenko, O., Aarestrup, F. M., Clausen, P. T., & Munk, P. (2024). ARGprofiler—a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets. Bioinformatics, 40(3), btae086. https://doi.org/10.1093/bioinformatics/btae086
The best way to install the ARGprofiler pipeline is to clone this GitHub repository. The pipeline uses the Conda package manager to deploy the defined software packages in the specified version without requiring admin or root privileges.
git clone https://github.com/genomicepidemiology/ARGprofiler.git
This command will create the ARGprofiler directory in the current directory.
Since ARGprofiler is a Snakemake pipeline, the user should install Snakemake workflow management following the guide here.
ARGprofiler takes as input a JSON file named input.json
in the following format:
{run_accession:{"type":READ_TYPE},"run_accession":{"type":READ_TYPE}}
run_accession
is the ENA id for the read sequencing datasets, and READ_TYPE
can be either PAIRED
or SINGLE
.
Example:
{"ERR3593315":{"type":"PAIRED"},"SRR7533096":{"type":"SINGLE"}}
The user can also opt to specify the name of the input file in the Snakefile (with open...).
For instructions on how to analyze unpublished sequencing reads check Tips and Tricks
The user has the option to run the pipeline either on an HPC or locally. For running on HPC, we provide the option of executing the workflow using environment modules or conda packages.
The user should specify the preferable option for executing the pipeline in the config file. If wanting to use a conda environment, keep use-conda:True
; otherwise, replace with use-envmodules:True
.
To run ARGprofiler on an HPC with a queuing system, the user should execute the following command:
snakemake --profile profile_argprofiler
While we have designed ARGprofiler to run in an HPC environment (specifically Computerome), it is possible to run the pipeline locally. Therefore, we recommend creating a mamba environment as follows:
mamba env create --name argprofiler --file rules/environment_argprofiler.yaml
Since we are not executing ARGprofiler in HPC, the user should remove the following flag from the config file: cluster, cluster-config
and add the following flag: cores
(The cores
flag should be changed to reflect the number of cores for Snakemake to use).
Then activate the environment and run Snakemake:
mamba activate argprofiler
snakemake --profile profile_argprofiler
When successfully executed, ARGprofiler creates a directory named results
, where the user can find all the available results from all the analysis steps (results are separated into single and paired-reads results). More specifically:
raw_reads
directory contains all the downloaded sequencing datasets.trimmed_reads
directory contains all the trimmed sequencing datasets.kma_mOTUs
directory contains all the alignment result files with the mOTUs database.
kma_panres
directory contains all the alignment result files with the PanRes database.
argextender
directory for extracting the genomic flanking regions around ARGs.
Mash
directory contains the mash sketches for each sequencing dataset.local_reads
and place the sequencing reads (both paired and single) in that directory. The pipeline makes use of Snakemake profiles to specify the configuration of the pipeline. The required flags are specified in the files of the profile_argprofiler
directory.
logs
directory in the main directory. Log files for each job will be placed there.Martiny, H. M., Pyrounakis, N., Petersen, T. N., Lukjančenko, O., Aarestrup, F. M., Clausen, P. T., & Munk, P. (2024). ARGprofiler—a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets. Bioinformatics, btae086. https://doi.org/10.1093/bioinformatics/btae086
We welcome any comments, bug reports, and suggestions, as they will help us improve ARGprofiler. You can leave comments and bug reports in the repository issue tracker or reach out by e-mail to nipy@food.dtu.dk or hanmar@food.dtu.dk