cosmodesi / pyrecon

package for BAO reconstruction
BSD 3-Clause "New" or "Revised" License
9 stars 7 forks source link

Implementation of MultiGridReconstruction scales badly with nthreads #1

Closed adematti closed 2 years ago

adematti commented 2 years ago

Increasing number of threads in MultiGridReconstruction does not lead to faster reconstruction. Understand why (and hopefully improve).

seshnadathur commented 2 years ago

In my tests, the speed of MultiGridReconstruction has now improved by a factor of >10 when run on 16 threads (for nmesh=512, in my tests it now takes ~55-60 seconds compared to >650s before changing the compilation flags). It now runs ~ as fast as IterativeFFTParticleReconstruction, and only 30%-40% slower than IterativeFFTReconstruction.

I can do more detailed tests of the scaling with increasing the number of threads >16 when the cluster is not so busy (next week?), but for now I suggest this issue looks like it is probably solved.

adematti commented 2 years ago

Excellent, thanks a lot for the feedback! (for the record, I just had to specify -O3 compilation flag - which I previously forgot - to speed things up by this large factor). Closing the issue.