chanzuckerberg / idseq-workflows

Portable WDL workflows for IDseq production pipelines
https://idseq.net/
MIT License
31 stars 12 forks source link

re-enable threading in bowtie2 #132

Closed morsecodist closed 3 years ago

morsecodist commented 3 years ago

While doing some stuff for on call I discovered and diagnosed an issue with our usage of bowtie2. bowtie2 now sometimes times out after 5 hours. I re-ran a sample from two years ago and though nearly everything was identical the former took well under an hour and the latter took over five. After analyzing the diff I found that as part of of my effort to make the pipeline results deterministic I added the --seed parameter to bowtie2. This --seed parameter is not compatible with multithreading so I disabled multithreading. bowtie2 isn't super time consuming and this ended up being mostly fine but it seems we have observed some performance issues with this approach. Though determinism is important I think our users are far more interested in their samples running in the first place than determinism (which almost none of them are really aware of). For now I am going to switch back and leave a comment. Determinism with multi-threading inherently leads to more complexity and overhead. We may need to do some experiments to see if we can get determinism in a more scalable way.

Here is the ticket about this that we closed: https://app.clubhouse.io/idseq/story/11242/fix-bowtie-timeout-after-10-hours-on-150m-total-read-samples