clawpack / classic

Classic single-grid Fortran routines
http://www.clawpack.org
BSD 3-Clause "New" or "Revised" License
11 stars 25 forks source link

OpenMP parallelization of 3D Classic #51

Closed gradylemoine closed 10 years ago

gradylemoine commented 10 years ago

This pull request contains OpenMP parallelization of the 3D Clawpack Classic code, at the level of step3/step3ds. This is joint work with with Kenny Roche (@rochekj). The bulk of the computation should be parallelized, but I haven't done weak scaling tests yet. Strong scaling is okay, but not great; on the 100^3 and 200^3 grids I used with the tests/acoustics3d and tests/acoustics3d_dimsplit examples, I observed roughly 60% parallel efficiency or better (with higher efficiency for fewer threads, as one would expect). Because of the way the code is parallelized, results should be bit-for-bit reproducible independent of the number of threads used. I encourage rigorous testing.

gradylemoine commented 10 years ago

Forgot to mention -- I also rewrote some of step3.f90 for better cache locality. Single-threaded performance should be slightly improved with the parameters set as they currently are. A few percent additional gain in single-threaded performance is possible by increasing the block size parameters (blksiz_i, blksiz_j, blksiz_k), although small block sizes are better for parallel load balancing.

gradylemoine commented 10 years ago

Just checked weak scaling on espresso.amath. I'm reasonably pleased with the results. I used the tests/acoustics3d case, and kept the amount of work per thread per timestep roughly constant. Here's what I got ("grid size" is the number of cells in all three directions):

Grid size  Threads  Time (s) Efficiency
100            1      287     100%
126            2      385      94%
159            4      571      80%
182            6      670      78%
200            8      760      76%