franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

feature: test and compare new binners #76

Open franciscozorrilla opened 3 years ago

franciscozorrilla commented 3 years ago

The binning landscape has changed since the initial development of metaGEM. It would be a good idea to get a shortlist of novel binning tools for testing, with the ultimate goal of adding and/or replacing binners. E.g. semibin

zoey-rw commented 2 years ago

Hi! I was wondering whether you currently have any plans for integrating new binners, or if any binner seemed particularly promising to you. I haven't tried running Vamb, but it seems like a good candidate for metaGEM, because it is under active development, already has a Snakemake workflow, and can leverage the jgi_summarize_bam_contig_depths approach that metaGEM already uses. Curious if you have thoughts on this!

Thanks, Zoey

franciscozorrilla commented 2 years ago

Hey Zoey!

Thanks for commenting, indeed Vamb has definitely been on my radar. However, based on the TL;DR section of the readme, it seems like there may need to be some tweaking/benchmarking involved due to the fact their recommended workflow is to concatenate assemblies before mapping, whereas the metaGEM implementation is to cross-map each individual assembly. I do see that they provide a Snakefile that may be a good starting point for implementing vamb in the metaGEM workflow. The CAMI2 paper does not shine a very flattering light on Vamb, although they have outlined a response here and suggested that best practices were not followed. In other news, semibin was recently published with peer review, and it looks like it does quite well compared to the other binners considered in their paper. Unfortunately semibin did not make it into the CAMI2 paper to get a 3rd party evaluation, and the documentation suggests that assemblies also need to be concatenated before mapping. Based on all this, I would probably start with semibin but vamb also seems worth trying out. Perhaps the testing/benchmarking process is something that could be facilitated with toolchest if @lebovic & co have these binners implemented on their infrastructure?

Unfortunately adding/testing new binners is not very high on my priority list due to time/resource constraints and other ongoing projects. I have been thinking about applying for some funding to help maintain and update metaGEM with the latest tools, e.g. the Chan-Zuckerberg Essential Open Source Software for Science. For now I am more focused on adding support for the reconstruction of single amplified genomes (SAGs), as well as long read sequencing compatibility.

If you do end up trying some of these binners please let me know how they compare to those already implemented in metaGEM 💎

Best wishes, Francisco

lebovic commented 2 years ago

Thanks for the mention, @franciscozorrilla!

We don't have those binners implemented yet, but let me know if you'd like us to add them @zoey-rw

franciscozorrilla commented 10 months ago

Some more new binners to test when time allows, probably a good idea to get around to this in 2024 🤞