BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
120 stars 48 forks source link

High read abundance but no bins #211

Closed zahidrehman closed 4 years ago

zahidrehman commented 5 years ago

Dear, I have metagenome assembly of two samples that I am trying to bin. Although I have a high abundance of Proteobacterial reads, after binning I don't get any Proteobacterial bin. What could be the possible reasons? Coverage should not be an issue; since I have high read abundance. I don't expect too much strain variation since these are biofilm samples. I know CONCOCT struggles with binning if samples are too few. However, I do get bins for other bacteria whose relative abundance of reads is less than Proteobacteria. Any comments would be highly appreciated. Best Regards Zahid

andand commented 5 years ago

Hi Zahid, How many samples do you have? Have you done any taxonomic annotation of contigs to figure out where the protoebacterial data endend up? If they are not in any bins they must be in contigs shorter than the cutoff for binning. The reason for this can be either low coverage of high intra-population diversity. If you do have proteobacterial contigs in impure bins or highly incomplete bins the reason can probably be too low number of samples. Kind regards, Anders

On Thu, Sep 27, 2018 at 10:35 AM zahidrehman notifications@github.com wrote:

Dear, I have metagenome assembly of two samples that I am trying to bin. Although I have a high abundance of Proteobacterial reads, after binning I don't get any Proteobacterial bin. What could be the possible reasons? Coverage should not be an issue; since I have high read abundance. I don't expect too much strain variation since these are biofilm samples. I know CONCOCT struggles with binning if samples are too few. However, I do get bins for other bacteria whose relative abundance of reads is less than Proteobacteria. Any comments would be highly appreciated. Best Regards Zahid

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BinPro/CONCOCT/issues/211, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1pP1nLvm1f2eTjODi-Iqjez5Tzhc82ks5ufI3SgaJpZM4W8KWZ .

zahidrehman commented 5 years ago

Dear Andres, I made the assembly with two samples (which were actually duplicates of one sample) sequenced by Illumina HiSeq 4000. In total, I had 10 samples but the assembler (metaSpades) failed to make a single assembly for all samples. So I had to assemble individual samples (duplicates). I was reading that the efficiency of CONCOCT decreases below 50 samples. Usually, in metagenomics one don't have that many samples and also sometimes it is not possible to make a single assembly of a large number of samples, like in this case. Is there a way around this limitation of the number of samples? Best Regards Zahid

alneberg commented 5 years ago

Hi again @zahidrehman, sorry for the very late reply. Did you already manage to get around this issue? Did you use all 10 samples in the coverage table? That would be recommended.

Best, Johannes