LANL-Bioinformatics / MeGAMerge

MeGAMerge (A tool to merge assembled contigs, long reads from metagenomic sequencing runs)
Other
14 stars 7 forks source link

Improved Contigs Length at the Expense of Sequence Quality (introduced ambiguous bases) #9

Closed Roli-Wilhelm closed 6 years ago

Roli-Wilhelm commented 6 years ago

Hi There! I've managed to run MeGAMerge on two sizeable metagenome assemblies and it seemed to a do a good job finding overlapping contigs, but introduced a large number of ambiguous bases (see the QUAST output below). I assume this is because there were conflicting bases when the sequences were merged. It appears that minimus2 introduces the Ns, but I haven't been able to find any information about how to manage this undesirable behaviour.

quast comparison of assemblies

Roli-Wilhelm commented 6 years ago

After mapping reads, it is clear that those Ns do correspond to nucleotide variation at those sites. For others interested in correcting these Ns (since some bioinformatic tools to handle ambiguous bases well), I recommend trying python or simply use samtools 'mpileup' to make a consensus sequence based on a majority rule base calling.