MaciekAber / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Quadratic behaviour in header generation #91

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The recently added patch that creates SAM compatible text header output if no 
'text' and no pre-build header is passed has an unfortunate quadratic runtime 
behavior due to string concatenation. This makes it pretty much unusable in 
situation with a lot of references (Salmo salamar currently has 550k contigs. I 
canceled after 20 minutes, the patch runs in less than 10 seconds).

Here's a patch that replaces it with the old 'build a list and join that' 
approach.

--- a/pysam/csamtools.pyx   Tue Jan 17 15:19:03 2012 +0100
+++ b/pysam/csamtools.pyx   Tue Mar 20 12:40:34 2012 +0100
@@ -567,9 +567,10 @@
                 # Optionally, if there is no text, add a SAM compatible header to output
                 # file.
                 if text is None and add_sq_text:
-                    text = ''
+                    text = []
                     for x from 0 <= x < header_to_write.n_targets:
-                        text += "@SQ\tSN:%s\tLN:%s\n" % (referencenames[x], 
referencelengths[x] )
+                        text.append("@SQ\tSN:%s\tLN:%s\n" % 
(referencenames[x], referencelengths[x] ))
+                    text = ''.join(text)

                 if text != None:
                     # copy without \0

Original issue reported on code.google.com by finkerna...@mathematik.uni-marburg.de on 20 Mar 2012 at 11:41

GoogleCodeExporter commented 8 years ago
Thanks, you are right.

Applied.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 7 Jul 2012 at 9:33