Closed gaioguy closed 3 years ago
Would this approach have any serious drawbacks?
I'll write a longer comment on this later today or sometime tomorrow. (I am not opposed to this change, but we should discuss the pro/cons of either approach and see if we can work out something that is as convenient as ParallelBoxIterator
is for the current algorithms and also works for adaptive_boxmap
.)
However, I am surprised that this is substantially faster... Sure, the ParallelBoxIterator has some overhead due to the work queue management... but it shouldn't be that much slower. What's the result of the Henon @btime
?
On my quad core i5 MacBook from 2018 with 8 threads:
184.056 ms (26579 allocations: 7.37 MiB)
30.387 ms (289 allocations: 7.90 MiB)
Can you try a workload (increase depth) that takes multiple seconds? Just curious :)
With depth=34
:
32.101 s (847814 allocations: 189.71 MiB)
1.248 s (454 allocations: 353.79 MiB)
Ok, this is definitely surprising. I'll look into the reason for this difference in speed before writing the longer comment.
This PR implements a more elegant and yet substantially faster
map_boxes
function. I have been going back to the drawing board for this since I spent a lot of time on extending the old approach (via theParallelBoxIterator
) in theadaptive_boxmap
branch and yet have been unable to do so. The current function is almost trivial (and yet easily multi-threaded) and can be trivially extended foradaptive_boxmap
. This is meant to be a proof of concept and would need to be extended to mapping to a differenttarget
and also for theTransferOperator
. Would this approach have any serious drawbacks?