COSIMA / access-om3

ACCESS-OM3 global ocean-sea ice-wave coupled model
13 stars 7 forks source link

not bitwise reproducible #40

Closed aekiss closed 1 year ago

aekiss commented 1 year ago

The current MOM6-CICE6 config (and presumably others) is not reproducible - compare these

/scratch/v45/aek156/access-om3/archive/MOM6-CICE6_ACCESS-OM3_repro_test_1
/scratch/v45/aek156/access-om3/archive/MOM6-CICE6_ACCESS-OM3_repro_test_2
aekiss commented 1 year ago

Maybe srcTermProcessing=1 and termOrder=srcseq are set by default somewhere?

MartinDix commented 1 year ago

Okay, so that's confirmed: compiling ESMF in debug mode leads to reproducible runs.

I'm not sure how critical ESMF is for performance, but it might be worth finding out which optimization level can be safely used to compile it.

My ESMF build had (from /scratch/tm70/mrd599/esmf-8.3.0/lib/libg/Linux.intel.x86_64_medium.openmpi.default/esmf.mk)

ESMF_F90COMPILEOPTS=-g -traceback -check arg_temp_created,bounds,format,output_conversion,stack,uninit -fPIC -debug minimal -assume realloc_lhs -m64 -mcmodel=medium -pthread -threads  -qopenmp
ESMF_CXXCOMPILEOPTS=-std=c++11 -g -traceback -Wcheck -fPIC -debug minimal -m64 -mcmodel=medium -pthread  -qopenmp

I think all the work is done in C++ routines and so the F90 options are unlikely to affect the reproducibility. The Intel compiler default is -O2, so the C++ options here don't seem very restrictive. Did the spack build use -O3?

micaeljtoliveira commented 1 year ago

The Intel compiler default is -O2

@MartinDix I'm afraid this is not the case here, as the -g option is set. According to the Intel compiler manual:

This option turns off option -O2 and makes option -O0 the default unless option -O2 (or higher) is explicitly specified in the same command line.

This holds for the the Fortran and C/C++ compilers.

Here are the options used by the Spack taken from the esmf.mk file:

ESMF_F90COMPILEOPTS= -O -fPIC -debug minimal -assume realloc_lhs -m64 -mcmodel=small -pthread -threads  -qopenmp
ESMF_CXXCOMPILEOPTS= -std=c++11 -O -DNDEBUG -fPIC -debug minimal -m64 -mcmodel=small -pthread  -qopenmp

In this case, -O is equivalent to -O2.

I suspect the non-bitwise reproducibility will get fixed by setting the floating-point model to precise or strict.

micaeljtoliveira commented 1 year ago

I can confirm that adding -fp-model=precise to both C++ and Fortran flags along with -O2 when compiling ESMF yields bitwise reproducible runs.

I would say we now have a solution to this problem, so I'm closing the issue.