Currently for MPI configurations (ran via benchcab spatial), running the same version of CABLE against itself sometimes does not produce the same output (bitwise).
Benchcab currently tests 4 different science configurations for all given CABLE versions, labelled S0, S1, S2 and S3. Sometimes one or more of these configurations will reproduce the same results bitwise however it is unlikely all configurations reliably reproduce.
Where differences occur, many variables have relative differences greater than 10% throughout the time series.
My guess as to why this is happening is uninitialised memory access somewhere (e.g. #395, #396, #397) is causing non-deterministic behaviour. Currently the MPI executable crashes when running it with ddt with balanced memory debugging settings enabled.
Steps to reproduce (Gadi):
CABLE version used: main c125ede1eb9e7881e8bce72563992e1d43c685fe
Benchcab version used: 4.1.0
Currently for MPI configurations (ran via
benchcab spatial
), running the same version of CABLE against itself sometimes does not produce the same output (bitwise).Benchcab currently tests 4 different science configurations for all given CABLE versions, labelled S0, S1, S2 and S3. Sometimes one or more of these configurations will reproduce the same results bitwise however it is unlikely all configurations reliably reproduce.
Where differences occur, many variables have relative differences greater than 10% throughout the time series.
My guess as to why this is happening is uninitialised memory access somewhere (e.g. #395, #396, #397) is causing non-deterministic behaviour. Currently the MPI executable crashes when running it with
ddt
with balanced memory debugging settings enabled.Steps to reproduce (Gadi):
CABLE version used: main c125ede1eb9e7881e8bce72563992e1d43c685fe Benchcab version used:
4.1.0
/scratch
:bench_example
and set the configuration file as follows:modules: [ intel-compiler/2021.1.1, netcdf/4.7.4, openmpi/4.1.0 ] EOF
module load conda/analysis3-24.04 benchcab spatial
module load nccmp nccmp -d runs/spatial/tasks/crujra_access_R_S0/archive/output000/cable_out.nc nccmp -d runs/spatial/tasks/crujra_access_R_S1/archive/output000/cable_out.nc nccmp -d runs/spatial/tasks/crujra_access_R_S2/archive/output000/cable_out.nc nccmp -d runs/spatial/tasks/crujra_access_R_S3/archive/output000/cable_out.nc