franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
203 stars 42 forks source link

feat: Implement complementary pre-assembly binning for additional MAG generation #42

Closed franciscozorrilla closed 2 years ago

franciscozorrilla commented 3 years ago

I have been thinking about trying this out for a long time, and was reminded of this approach in a recent tweet. Digging into the most recent literature, this paper found assemble-first and bin-first approaches to be complementary.

I originally thought that this could be implemented as a complementary draft bin generating approach to be refined and reassembled along with the other 3 draft bin sets. However this is likely not possible since the refinement step requires that all bins be generated from the same assembly.

An alternative approach would be to use this pre-binning assembly only to recover genomes that were not reconstructed using the assemble-first approach. Although perhaps a bit naive, this could be implemented based on taxonomy:

  1. Generate refined and reassembled MAGs as usual (henceforth MAGs_A)
  2. Generate MAGs using bin-first approach with LSA (henceforth MAGs_B)
  3. Assign taxonomy to MAGs from steps i. and ii. using GTDB-Tk
  4. Remove from MAGs_B any bins that have a taxonomic label found in MAGs_A

An alternative and likely more robust approach could be to simply run dRep on each sample's MAGs_A + MAGs_B to get dereplicated bins. See issue in dRep repo.