Closed yantosca closed 3 years ago
As it turns out, the results above were not an apples-to-apples comparison. The KPP integrator routine in the Dev code was calling UPDATE_RCONST (i.e. the rate law update function) within the integrator on each integration step. Although the time to compute the reaction rates improved, the overall integration time increased.
A new test has been done that shows removing the useless computations in the rate law functions actually decrease the time by about 4.5 minutes per 7 day simulation (or almost 20 minutes for a 31-day benchmark simualtion). See this link: https://github.com/geoschem/geos-chem/issues/598#issuecomment-778237911
Overview
I have been trying to do some profiling of GEOS-Chem classic on a SLURM partition consisting of Cascade Lake processors. Each node has 24 physical cores but hyperthreading is activated (i.e. each has 2 logical cores), so there are 48 possible computational cores per node. However, it seems that the hyperthreading is interfering with attempts to profile the code with the TAU Performance Profiler (i.e. inaccurate results are obtained).
Based on some advice from FAS Research Computing, I've tried to set up runs on the partition such that each OpenMP thread binds to a physical core (and not a logical core). Cores 0-23 on Cascade Lake are on the physical CPUs, and cores 24-47 are the hyperthreaded cores.
In OpenMP 4.5 and higher you should be able to bind OpenMP threads to physical CPUs with
but in my experience I find that this doesn't work well.
Another way to do this is to set the
OMP_CPU_AFFINITY
environment variable to tell which cores you want to use (i.e. make sure you only use core numbers 0-23 and skip 24-47). I tried the following setups below:Experiments
1. Gfortran 10.2 with environment variable OMP_CPU_AFFINITY="0:23"
Compiler
Runscript commands
Logfile output
Analysis
I'm not sure if this is exactly what we want. It seems like we are using the hyperthreaded cores 24-38. (Or it could be that these are all physical cores and that forever reason the numbering scheme is not what we would expect.)
It seems that there is another environment variable we can try: instead of
OMP_CPU_AFFINITY
we can setGOMP_CPU_AFFINITY
. GOMP is the GNU OpenMP library that is bundled with GCC and GFortran compilers. Let's see if this makes a difference.2. Gfortran 10.2 with GOMP_CPU_AFFINITY="0-23"
Compiler
Run script commands:
Logfile output:
Analysis
Each thread is bound to a CPU but there are multiple threads/CPU in the range 0-23.
3. ifort 19.0.5 with OMP_CPU_AFFINITY="0-23"
Compiler:
Job file commands:
Logfile output:
Analysis:
This seems to do what we want.
Recommendation:
It seems that using
for GNU Fortran will avoid the hyperthreading cores. (We should really also confirm this)
Also
seems to avoid the hyperthreading cores with Intel Fortran on the Cascade Lake partition.
I will rerun the profiles with these commands to see if the profiling output makes more sense.