Closed ASLeonard closed 3 years ago
Hi Brian, I was sharing what I found improved execution for my data/cluster. I'll happily defer to you for more reasoned improvements as there is still active development going on.
Redundant pull after Brian hugely improved the parallelisation.
I went back to figure out why the threading was generally quite unbalanced in the varMer call, and it seems to arise from the default chunking of threads. Processing chromosomes is obviously very different from small unplaced contigs, and so the workload is poorly balanced. I ended up testing
static
->dynamic
and found about a 25% improvement in runtime (although some variance due to different nodes).Also moved the private variables into the loops to match the recent change to
seq
for consistency and clarity. The flush pragmas are also removed in favour of reduction, and is implied anyway in a critical pragma.NB The order of output is lost with dynamic scheduling, but this isn't necessarily as important.