amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

Paired read matcher can use enormous memory for large input with many chimeric reads #69

Closed bolosky closed 3 years ago

bolosky commented 8 years ago

The paired read matcher reads in an input SAM/BAM file in order and emits matched pairs of reads. For a sorted input where there are lots of chimerically mapped reads, it may be a long time between mate pairs showing up, and in the interim SNAP stores the first end in memory (not only uncompressed, but in a format that is actually pretty wasteful of buffer space).

This can use an inordinate amount of memory for large input files with a high chimeric read fraction. We will need to find some way to mitigate this, probably by spilling to disk.

bolosky commented 3 years ago

Fixed in 1.0.