Closed cdholmes closed 3 years ago
Thanks @cdholmes. This seems to be a newer feature of OpenMP, I was not familiar with it before. We should try to find out which versions of ifort and gfortran can support this. I can work on that as time allows.
FYI, some references:
My opening comment in this thread and PR #496 suggested using Collapse ( N ) where N = 2. On further reflection, I think it is probably better to pick the largest value of N such that the innermost parallel loop contains significant computational work in each iteration. There's no downside to choosing large N as long as the work inside the loop is much greater than the OpenMP overhead. For chemistry, the KPP solver in each box is guaranteed to be compute intensive, so setting N = 3 (parallelize I,J,L loops) makes sense. With 50 CPUs on my machine, N = 3 is negligibly faster than N = 2, but N = 3 is more future-proof as CPU core counts continue to increase.
I will look into using COLLAPSE after the 13.0.0 version is released. Seems promising.
Also, I have been working on eliminating excess computations (i.e. don't compute terms that evaluate to 1) in the various rate-law equations in gckpp,kpp. See issue https://github.com/geoschem/geos-chem/issues/567 for more info. If we remove unnecessary computations that get done on every (I,J,L) box, that should also result in a computational speedup of some kind.
This feature request is now moved to #639. Closing this issue
Overview
I found that adding an OpenMP collapse directive reduced the FlexChem runtime by 12% (with 20 threads) to 32% (with 50 threads). [GEOS-Chem version 12.9.2, Intel 19.0.5 compiler] I suspect that other parts of GEOS-Chem could similarly benefit. PR #496 demonstrates the change in FlexChem. In most OpenMP blocks, GEOS-Chem currently only parallelizes the outer loop. Collapse( 2 ) will parallelize the outer two loops.
Action items
I suggest adding collapse directives to other time-consuming parallel blocks, as GCST time allows.