Open liangyong1991 opened 4 years ago
Yes, but use --megahit during assembly and bin with the only metabat2 for speed. I answered questions like this in-depth in some of the other issue threads.
Thanks a lot for your advice,and I will try it .Did you know how long it would probably take if use three binning software?
I personally wouldn't try - it might take weeks. Maxbin and concoct do not scale very well with enormous datasets. One trick you can use to speed up the binning process (with any binner) is to throw away contigs smaller than 2kb, 3kb, or even 5kb - this will make the bins somewhat less complete, but significantly reduce the search space.
I have the same question about large scale data about 200 sample with total 6T, so how could I assembe it ? Can I split with 4 groups with 50 samples about total 400Gb per subgroup to assemble it, does sample numbers and total data size per group have any effects on assembled results ? I am looking forward your replies. Thank a lot!
With the depth you have you could just assemble and bin all 200 individually, and then use DRep to get unique bins/MAGs, but then you would miss more rare species that didn't have enough coverage in any one sample but might have come up if you concatenated some of the data. Grouping a lot of data has the opposite problem where some of the very high abundance species won't assemble well because of extra strain heterogeneity. If you decide to process very large chunks of data I would recommend megahit for assembly and metabat2 for binning - the other methods don't scale too well. You could also do both of these approaches and then use a combination of DRep and manual curation to cherry-pick the best MAGs of each species depending on in protocol they assembled/binned the best.
Ultimately I can't say what will give you the best results. It really depends on the complexity and sequence depth of individual samples, the priorities of your study (do you care more about the major species of the rare ones), and your available resources. I would start by experimenting a bit with a few samples or groups to find what works best in your case.
Thanks a lot for your comment, very deep and reasonable insights about metagenomics assembly issue!
I have 100 samples,with total 3T's data.Does metaWRAP suit for this project,or is there any other solutations?