Closed AntonJMLarsson closed 2 years ago
This looks great! Thank you for the PR!! Can I ask about the 5 threads in the Alignment file import? should I parameterize that or is that a magic number?
I've found that it usually helps a little bit to speed up the reading. More than 5 threads usually have diminishing returns, so having more than that is probably not helpful. But as you can see I decided to remove it in the next commit, since it should depend on the -c option and that is not currently an argument to this particular script. If it was implemented as an argument I'd do something like threads = min(5,cores)
to ensure not more cores are being used than what has been specified.
Regardless, the main speed-up is certainly the two main changes detailed in the first comment.
I made two changes to split_barcoded_bam.py which speeds it up roughly 10-fold:
Both changes turn these two tasks from linear-time to constant-time operations, thereby speeding up the overall code.
Best, Anton