geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
166 stars 157 forks source link

[DISCUSSION] OpenMP Collapse can significantly improve speed #497

Closed cdholmes closed 3 years ago

cdholmes commented 3 years ago

Overview

I found that adding an OpenMP collapse directive reduced the FlexChem runtime by 12% (with 20 threads) to 32% (with 50 threads). [GEOS-Chem version 12.9.2, Intel 19.0.5 compiler] I suspect that other parts of GEOS-Chem could similarly benefit. PR #496 demonstrates the change in FlexChem. In most OpenMP blocks, GEOS-Chem currently only parallelizes the outer loop. Collapse( 2 ) will parallelize the outer two loops.

Action items

I suggest adding collapse directives to other time-consuming parallel blocks, as GCST time allows.

yantosca commented 3 years ago

Thanks @cdholmes. This seems to be a newer feature of OpenMP, I was not familiar with it before. We should try to find out which versions of ifort and gfortran can support this. I can work on that as time allows.

yantosca commented 3 years ago

FYI, some references:

  1. https://stackoverflow.com/questions/43684422/how-does-openmp-collapse-works-internally
  2. https://stackoverflow.com/questions/44197573/performance-openmp-collapse-vs-no-collapse-for-large-nested-loops
  3. https://stackoverflow.com/questions/28482833/understanding-the-collapse-clause-in-openmp
  4. https://nanxiao.gitbooks.io/openmp-little-book/content/posts/collapse-clause.html
cdholmes commented 3 years ago

My opening comment in this thread and PR #496 suggested using Collapse ( N ) where N = 2. On further reflection, I think it is probably better to pick the largest value of N such that the innermost parallel loop contains significant computational work in each iteration. There's no downside to choosing large N as long as the work inside the loop is much greater than the OpenMP overhead. For chemistry, the KPP solver in each box is guaranteed to be compute intensive, so setting N = 3 (parallelize I,J,L loops) makes sense. With 50 CPUs on my machine, N = 3 is negligibly faster than N = 2, but N = 3 is more future-proof as CPU core counts continue to increase.

yantosca commented 3 years ago

I will look into using COLLAPSE after the 13.0.0 version is released. Seems promising.

Also, I have been working on eliminating excess computations (i.e. don't compute terms that evaluate to 1) in the various rate-law equations in gckpp,kpp. See issue https://github.com/geoschem/geos-chem/issues/567 for more info. If we remove unnecessary computations that get done on every (I,J,L) box, that should also result in a computational speedup of some kind.

yantosca commented 3 years ago

This feature request is now moved to #639. Closing this issue