GregoryFaust / samblaster

samblaster: a tool to mark duplicates and extract discordant and split reads from sam files.
MIT License
225 stars 30 forks source link

Avoid buffer length issues when marking long SAM lines #13

Closed chapmanb closed 9 years ago

chapmanb commented 9 years ago

While testing with hg38, samblaster dies on very long SAM lines from HLA matches with a larger number of alternative matches:

samblaster: New buffer length exceeds maximum while changing field value

This is a small test case which reproduces the issue (although confusingly, it fails consistently on CentOS 6.6 but not at all on Ubuntu 14.04, both with gcc 4.8.2):

https://s3.amazonaws.com/chapmanb/samblaster/samblaster_hg38_linelength.tar.gz

This fix avoid the issue by increasing the maximum buffer length of the line to accommodate these. My C/C++ is terrible so happy for a review on how this affected memory usage or why it behaves differently on Ubuntu/CentOS.

GregoryFaust commented 9 years ago

Fixed in a more robust fashion in release 0.1.22.

chapmanb commented 9 years ago

Brilliant, thanks much for the better fix and update.