EESSI / test-suite

A portable test suite for software installations, using ReFrame
GNU General Public License v2.0
3 stars 11 forks source link

Process binding for pure MPI tests #138

Open casparvl opened 2 months ago

casparvl commented 2 months ago

I'm seeing some strange performance issues with the GROMACS test on our system. I.e. occasionally, it just runs 10 times slower. Looking at htop, I see individual cores not being used - even though I would have expected each core to be running a single process (the GROMACS test is pure MPI).

The generated job script looks like this for a 2-node test:

#!/bin/bash
#SBATCH --job-name="rfm_EESSI_GROMACS_bd8ac108"
#SBATCH --ntasks=256
#SBATCH --ntasks-per-node=128
#SBATCH --cpus-per-task=1
#SBATCH --output=rfm_job.out
#SBATCH --error=rfm_job.err
#SBATCH --time=0:30:0
#SBATCH -p rome
#SBATCH --export=None
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
module load GROMACS/2024.1-foss-2023b
export OMP_NUM_THREADS=1
curl -LJO https://github.com/victorusu/GROMACS_Benchmark_Suite/raw/1.0.0/HECBioSim/Crambin/benchmark.tpr
mpirun -np 256 gmx_mpi mdrun -nb cpu -s benchmark.tpr -dlb yes -npme -1 -ntomp 1

I checked the binding of each process. To my surprise, the processes were bound to NUMA domains. I would never have expected that. According to https://www.open-mpi.org/doc/current/man1/mpirun.1.php when the number of processes is larger than 2, binding should be to socket.

Note that both binding to NUMA domain and to socket are potentially bad for the reproducibility of test performance: to make this performance predictable, I would just like to bind to core. I'm wondering if we shouldn't just call the set_compact_process_binding hook for this test... I'm not sure if this is the cause of my performance variation, but it seems like a good idea to me to enforce binding to core (which is essentially done by set_compact_process_binding) for the GROMACS test (and potentially others).

Right now, set_compact_process_binding is only used in the TensorFlow test, where it is quite essential (since that is a hybrid test).

boegel commented 2 months ago

Maybe @victorusu has some experience with this for GROMACS?

casparvl commented 2 months ago

See https://github.com/EESSI/test-suite/pull/139 . I seem to get both better and more consistent performance with binding. Since reproducibility of the performance is important, I'd be in favor of enabling it (I'd probably even be in favor if the performance was worse, as long as it is more consistent :P).