Closed transition-bio-1 closed 11 months ago
Hi Sasha,
Great that you are getting hands on experience with bioinfo tools, happy to help!
The short answer is: no ❌
The long answer is: metaGEM was developed for usage on linux-based high performance computer clusters, considering that personal computers (mac or windows) do not have the computational resources to efficiently store and analyze real metagenomics datasets. Even if you could set up metaGEM on your macbook, you would have a very hard time with many steps in the workflow, e.g. short read assembly which typically requires multiple cores and significant RAM. So while it should be possible to install dependencies and set up metaGEM on a macbook, this would be of very limited use. Instead, I would suggest you get access to your institution's high performance computer cluster, where you should be able to set up metaGEM ✅
See also issues #16 and #122 for more info on this topic. You may also be interested in this google colab notebook that sets up metaGEM and runs assembly on a toy set of samples.
Best wishes and good luck! Francisco
Thanks for a quick response!
I figured that was probably going to be the answer. The M2 machines are actually getting up there in cores and RAM availability for smaller jobs. I have a somewhat unique use case with small enough datasets that I was hoping to crunch at a leisurely pace :)
Still. A great workflow and pretty instructive. Thank you!
Sasha Milshteyn, PhD Founder, Transition Bio[mining] Tel: +1.415.680.3435
On Thu, Jul 20, 2023 at 4:05 AM Francisco Zorrilla @.***> wrote:
Hi Sasha,
Great that you are getting hands on experience with bioinfo tools, happy to help!
The short answer is: no ❌
The long answer is: metaGEM was developed for usage on linux-based high performance computer clusters, considering that personal computers (mac or windows) do not have the computational resources to efficiently store and analyze real metagenomics datasets. Even if you could set up metaGEM on your macbook, you would have a very hard time with many steps in the workflow, e.g. short read assembly which typically requires multiple cores and significant RAM. So while it should be possible to install dependencies and set up metaGEM on a macbook, this would be of very limited use. Instead, I would suggest you get access to your institution's high performance computer cluster, where you should be able to set up metaGEM ✅
See also issues #16 https://github.com/franciscozorrilla/metaGEM/issues/16 and #122 https://github.com/franciscozorrilla/metaGEM/issues/122 for more info on this topic. You may also be interested in this google colab notebook https://colab.research.google.com/drive/1I1S8AoGuJ9Oc2292vqAGTDmZcbnolbuj#scrollTo=awiAaVwSF5Fz that sets up metaGEM and runs assembly on a toy set of samples.
Best wishes and good luck! Francisco
— Reply to this email directly, view it on GitHub https://github.com/franciscozorrilla/metaGEM/issues/134#issuecomment-1643722154, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBGWNMOGX6VQOUIX42AF7JLXREGGFANCNFSM6AAAAAA2QP6GTE . You are receiving this because you authored the thread.Message ID: @.***>
Its true, I also think that for certain tasks/datasets there could be situations where it would be nice to run some analysis with metaGEM on a macbook (#123). Its not at the top of my priority list but will try to get to this soon 💎
Was curious about this so looked into it earlier than expected.
There is good news and bad news: as a temporary workaround you can follow the instructions from the manual setup to install dependencies from yml files.
Unfortunately, gtdbtk is not available on osx due to the fact that a key dependency pplacer is only available for linux architectures. In any case, gtdbtk is far too computationally intensive to run on a macbook, even with M1/M2. Also, as you already identified CONCOCT v1.1.0 is not avaiable for osx so you would have to settle for the earlier v0.4.2 for now.
First create a modified version of the metaGEM_env.yml
file, with relaxed versioning for CONCOCT and gtdbtk removed:
name: metagem
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bedtools>=2.29.2
- bwa>=0.7.17
- concoct
- diamond>=2.0.6
- fastp>=0.20.1
- maxbin2>=2.2.7
- megahit>=1.2.9
- metabat2>=2.15
- r-base>=3.5.1
- r-gridextra>=2.2.1
- r-tidyverse
- r-tidytext
- samtools>=1.9
- snakemake>=5.10.0,<5.31.1
Then use mamba to set up the dependencies:
$ mamba env create --prefix ./envs/metagem -f metaGEM_env.yml
As for the metawrap env, you can refer to their own installation details , and there is also some documentation in the above-linked metaGEM manual setup wiki. This should work:
mamba create -y -n metawrap-env python=2.7
conda activate metawrap-env
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels ursky
mamba install biopython blas=2.5 blast=2.6.0 bmtagger bowtie2 bwa checkm-genome fastqc kraken=1.1 kraken=2.0 krona=2.7 matplotlib maxbin2 megahit metabat2 pandas prokka quast r-ggplot2 r-recommended salmon samtools=1.9 seaborn spades trim-galore
Hope this helps!
Amazing! Will give this a shot over the weekend. Thank you so much for being so responsive.
Hi Francisco,
I'm relatively inexperienced in bioinformatics and maybe there is an easy workaround that I'm missing, but I am struggling to figure out if this workflow can be run on a mac M2 machine. At the very least concoct v1.1.0 is not available for osx-64 (only 0.4.2) and I'm not sure about other dependencies. Do you have any suggestions for how to set up metaGEM on a mac? The workflow itself looks excellent and I would love to try it out.
Best, ~s.