Open yjiakang opened 4 years ago
Concatenating contigs from individual assemblies would not work because you will have a large number of duplicate sequences. There are tools out there that can de-replicate and combine multiple assemblies, but I cannot personally recommend this unless it is not possible to co-assemble. Additionally you will lose any low-abundance organisms that could have been assembled if you used all the reads to begin with.
@ursky Thanks for you professional answer very much. I will have a try on my 12 samples.
Concatenating contigs from individual assemblies would not work because you will have a large number of duplicate sequences. There are tools out there that can de-replicate and combine multiple assemblies, but I cannot personally recommend this unless it is not possible to co-assemble. Additionally you will lose any low-abundance organisms that could have been assembled if you used all the reads to begin with.
This worries me now. I have 40 metagenomes that I assembled individually via metaspades. I then concatenated all the assemblies into a single file I named 'final_assemblies.fasta' and piped it into metawraps binning and bin refinement modules. In the end I was left with 406 MAGs with a completeness score of ≥70% and contamination of ≤ 5%. Was this the wrong way to do this? Should I redo from scratch by concatenating the reads first?
Its not ideal, but if you got satisfactory results you are happy with it is still OK. The issue is that you can have contigs from different samples (and therefore different taxa) in the same cluster. Ideally, you should have processed the samples independently and then used DRep to get a unique set of MAGs.
Awesome! Thanks. I will keep that in mind in the future. I will concatenate at the Read level in the future as you recommended above. I appreciate the clarification.
Hi, I noticed you concatenated all the pair-end raw reads into two file --_1.fastq, _2.fastq, respectively. Then you did assembly based on the pooled reads and further ananlysis. Here I am confused that whether it would be ok to concatenate all the assembly contig which I have done before and the concatenate all the corresponding raw reads to do further analysis (i.e. binning, bin_refinement, etc). Thanks in advance.