bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

Open ppericard opened 4 years ago

ppericard commented 4 years ago

Right now, we are using SortMeRNA to align contigs against the complete ref db, and we are outputting all alignments. This can lead to very big SAM files because some very conserved contigs will have alignments against almost every ref sequence. In the next sub-step, when reading that SAM file with Python, we will load in memory all alignments of the same contig, which can lead to huge memory usage.

We can imagine several complementary solutions to reduce this RAM and disk space usage: