BeckResearchLab / assembly_and_binning

Metagenomics assembly and binning pipeline for the Beck Research Lab
MIT License
1 stars 2 forks source link

Reassembly #1

Open franciscozorrilla opened 4 years ago

franciscozorrilla commented 4 years ago

Hi, very cool repo! I had a small question regarding the last few bullet points in the readme workflow, as I have been thinking of implementing something similar to address the fact that most MAGs generated through binning are highly fragmented.

Map trimmed reads back to contigs for individual bins Extract mapped reads Include pairs, even if unmapped Reassemble bins with mapped reads using spades

Did you test this out and do you have any code you could share?

Thanks and best wishes, FZ

dacb commented 4 years ago

We did test it out using velvet as the secondary assembler. The results did not improve, i.e. we didn't increase the N50. In subsequent work, we looked at the assembly graphs and it became clear that the binning was not specific enough. That is, our communities (methane enrichments from Lake Washington) contain many very closely related species. As a result, there is enough sequence overlap that the assembly graph was still "poisoned" with lots of knots.

In the end, we isolated about 55 species of primary interest from the enrichments and sequenced them to high quality draft status. Then we took the raw reads from the meta genome sequencing runs and mapped them to the isolate genomes to measure relative abundances and genetic drift/optimization during isolation.

franciscozorrilla commented 4 years ago

Thanks for the response! I tried out the bin reassembly module of metaWRAP, and found that it only marginally improves bins contamination scores at the expense of completion, which may be a necessary trade off. Similar to your findings I did not improve the N50 with reassembly, which is very disappointing. Perhaps a reference guided secondary assembly using high quality bins/MAGs could be something to try out?

If you are still trying to optimize your binning I would recommend using CONCOCT instead of/in combination with metabat. metaWRAP has a good binning refinement module which combines the binning output from different binners. The figure below shows the effect of using different contamination cut offs (10% and 20%) and the number of bins obtained with different binners on my gut metagenomics dataset.

image

Not only is the number of bins improved but also their completeness and contamination scores (estimated using CheckM).

image

Refer to the first figure's legend to see which colors correspond to the binning tools. Note that these two figures only show the results for bin refinement, still working on the bin reassembly comparison.