Open Thexiyang opened 6 years ago
I actually took out the Binning_refiner
bins from the final plot because I thought thout it was confising since the module is also called Bin_refinement
. I should probably remove that from the tutorial... If you are curious about what that would look like, the other figure has binsABC, which is actually the same as Binning_refiner
. It is the result of running Binning_refiner
on all three inputs.
And with the two empty bins, you mean they are .fa
files with a size 0 bytes? Is there anything there?
Thanks for the explanation. Now I got it. I suggest to remove it from the tutorial, as it might confuse the beginners like me.
And yes, it is 0 bytes size. But others are fine. So I have in total 264 good bins plus 2 bins with 0 size. Just did not understand why they are there. I need to mention that metaWRAP significantly improved the bin quality.
Just another question. I have two bins with the highest abundance based on the module Quant_bins
. But their completeness are the lowest ones (only 50%). I define good bins as -c 50 -x 10
. What could be reason for this? I imagine they should have good completeness due to their high abundance.
Can you check if those two bins are in the metaWRAP.stats
file?
There are not there. Checkm just ignored them.
One more thing, are they in the binsO folder in the work directory?
they are in binsO
But they are empty there too, right?
yes, the same. all 0 size
And I'm guessing they are also in binsM, but are not empty?
sorry misunderstood your questions. yes, you are right!
sorry misunderstood your questions. yes, you are right!
I found the issue. It looks like the de-replication stage of the bin consolidation resulted in two bins that have no contigs at all. This is an artifact resulting from your low min completion parameter. Basically, ignore them! Everything is good.
For future users, I put a patch into metaWRAP v=0.8.4
that fixes this. It will come out in the next couple weeks.
Thanks for your feedback!
As for your other question about high-abundance bins with poor completion metrics, this is unfortunately very common. I see it in my data all the time. The reason for this is that these high-abundance species also often have high strain heterogeneity. This confuses both the assembler, and the function that estimates contig coverage, resulting in poor bins. If you really care about those organisms, you can try to assemble and bin single samples individually (or in small groups) in hopes that this reduces the coverage and heterogeneity to the point where you can assemble and bin them better.
Thanks!
What about the reassemble? My last try on reassemble module
did not work out as it got stuck on one bin for almost 12 hours. Would it be possible to improve the completeness of these target bins by reassemble? I am thinking should I give it another try?
Bin reassembly will most likely moderately increase the bin completion and significantly reduce bin contamination. It won't increase the completion that much. Have a look at the reassembly benchmarks in the publication.
And yeah, the reassembly can be very slow for bins that have a very high number of reads mapping to them. The module runs on all the bins in parallel (limited by your thread count of course), but with 1 thread per bin, which is why its so slow for those very high abundance ones. Its speeds things up for most users, but not all...
I actually just released metaWRAP v=0.8.4
, which has a new parallelization option. Now you can chose to run without the parallelization feature, which means the bins will be reassembled one by one, but using all the threads available. This will help you overcome your issue with that one bin!
thanks. I will update it to the new version and rerun reassembly.
Hi, I am checking the data from
Bin_refinement module
. But I did not find theBinning_refiner.stats
,Binning_refiner
as mentioned inUsage_tutorial.md
. But others are all there. And there are two empty bins in themetaWRAP_bins
, which I think should be removed. let me know if this can be an issue.Thanks,