franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link
bioinformatics computational-biology flux-balance-analysis genome-scale-metabolic-model gut-microbiome mags metabolic-modeling metabolic-models metabolism metagenome-assembled-genomes metagenomics microbial-ecology microbiome snakemake systems-biology

💎 metaGEM

Note An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data.

Nucleic Acids Research bioRxiv Build Status GitHub license Snakemake Anaconda-Server Badge Gitter chat DOI Open In Colab Anaconda-Server Badge Anaconda-Server Badge

metawrapfigs_final4 001

metaGEM is a Snakemake workflow that integrates an array of existing bioinformatics and metabolic modeling tools, for the purpose of predicting metabolic interactions within bacterial communities of microbiomes. From whole metagenome shotgun datasets, metagenome assembled genomes (MAGs) are reconstructed, which are then converted into genome-scale metabolic models (GEMs) for in silico simulations. Additional outputs include abundance estimates, taxonomic assignment, growth rate estimation, pangenome analysis, and eukaryotic MAG identification.

⚙️ Installation

You can start using metaGEM on your cluster with just one line of code with the mamba package manager

mamba create -n metagem -c bioconda metagem

This will create an environment called metagem and start installing dependencies. Please consult the config/README.md page for more detailed setup instructions.

installation

🔧 Usage

Clone this repo

git clone https://github.com/franciscozorrilla/metaGEM.git && cd metaGEM/workflow

Run metaGEM without any arguments to see usage instructions:

bash metaGEM.sh
Usage: bash metaGEM.sh [-t|--task TASK] 
                       [-j|--nJobs NUMBER OF JOBS] 
                       [-c|--cores NUMBER OF CORES] 
                       [-m|--mem GB RAM] 
                       [-h|--hours MAX RUNTIME]
                       [-l|--local]

 Options:
  -t, --task        Specify task to complete:

                        SETUP
                            createFolders
                            downloadToy
                            organizeData
                            check

                        CORE WORKFLOW
                            fastp 
                            megahit 
                            crossMapSeries
                            kallistoIndex
                            crossMapParallel
                            kallisto2concoct 
                            concoct 
                            metabat
                            maxbin 
                            binRefine 
                            binReassemble 
                            extractProteinBins
                            carveme
                            memote
                            organizeGEMs
                            smetana
                            extractDnaBins
                            gtdbtk
                            abundance

                        BONUS
                            grid
                            prokka
                            roary
                            eukrep
                            eukcc

                        VISUALIZATION (in development)
                            stats
                            qfilterVis
                            assemblyVis
                            binningVis
                            taxonomyVis
                            modelVis
                            interactionVis
                            growthVis

  -j, --nJobs       Specify number of jobs to run in parallel
  -c, --nCores      Specify number of cores per job
  -m, --mem         Specify memory in GB required for job
  -h, --hours       Specify number of hours to allocated to job runtime
  -l, --local       Run jobs on local machine for non-cluster usage

🧉 Try it now

You can set up and use metaGEM on the cloud by following along the google colab notebook.

Open In Colab

Please note that google colab does not provide the computational resources necessary to fully run metaGEM on a real dataset. This notebook demonstrates how to set up and use metaGEM by perfoming the first steps in the workflow on a toy dataset.

💩 Tutorials

metaGEM can be used to explore your own gut microbiome sequencing data from at-home-test-kit services such as unseen bio. The following tutorial showcases the metaGEM workflow on two unseenbio samples.

Tutorial

For an introductory metabolic modeling tutorial, refer to the resources compiled for the EMBOMicroCom: Metabolite and species dynamics in microbial communities workshop in 2022.

Tutorial3

For a more advanced tutorial, check out the resources we put together for the SymbNET: from metagenomics to metabolic interactions course in 2022.

Tutorial2

🏛️ Wiki

Refer to the wiki for additional usage tips, frequently asked questions, and implementation details.

wiki

📦 Datasets

🐍 Workflow

Core

  1. Quality filter reads with fastp
  2. Assembly with megahit
  3. Draft bin sets with CONCOCT, MaxBin2, and MetaBAT2
  4. Refine & reassemble bins with metaWRAP
  5. Taxonomic assignment with GTDB-tk
  6. Relative abundances with bwa and samtools
  7. Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
  8. Species metabolic coupling analysis with SMETANA

Bonus

  1. Growth rate estimation with GRiD, SMEG or CoPTR
  2. Pangenome analysis with roary
  3. Eukaryotic draft bins with EukRep and EukCC

🏗️ Active Development

If you want to see any new additional or alternative tools incorporated into the metaGEM workflow please raise an issue or create a pull request. Snakemake allows workflows to be very flexible, so adding new rules is as easy as filling out the following template and adding it to the Snakefile:

rule package-name:
    input:
        rules.rulename.output
    output:
        f'{config["path"]["root"]}/{config["folder"]["X"]}/{{IDs}}/output.file'
    message:
        """
        Helpful and descriptive message detailing goal of this rule/package.
        """
    shell:
        """
        # Well documented command line instructions go here

        # Load conda environment 
        set +u;source activate {config[envs][package]};set -u;

        # Run tool
        package-name -i {input} -o {output}
        """

🖇️ Publications

The metaGEM workflow has been used in multiple studies, including the following non-exhaustive list:

Plastic-degrading potential across the global microbiome correlates with recent pollution trends
J Zrimec, M Kokina, S Jonasson, F Zorrilla, A Zelezniak
MBio, 2021
Competition-cooperation in the chemoautotrophic ecosystem of Movile Cave: first metagenomic approach on sediments
Chiciudean, I., Russo, G., Bogdan, D.F. et al. 
Environmental Microbiome, 2022
The National Ecological Observatory Network’s soil metagenomes: assembly and basic analysis
Werbin ZR, Hackos B, Lopez-Nava J et al. 
F1000Research, 2022
Microbial interactions shape cheese flavour formation
Melkonian, C., Zorrilla, F., Kjærbølling, I. et al.
Nature Communications, 2023

🍾 Please cite

metaGEM: reconstruction of genome scale metabolic models directly from metagenomes
Francisco Zorrilla, Filip Buric, Kiran R Patil, Aleksej Zelezniak
Nucleic Acids Research, 2021; gkab815, https://doi.org/10.1093/nar/gkab815

Nucleic Acids Research

⭐ Star History

Star History Chart

📲 Contact

Please reach out with any comments, concerns, or discussions regarding metaGEM.

Gitter chat Twitter LinkedIn email