CABLE-LSM / CABLE

Home to the CABLE land surface model and its documentation
https://cable.readthedocs.io/en/latest/
Other
8 stars 3 forks source link

Diverging results when enabling architecture specific optimisation flags #307

Open SeanBryan51 opened 3 months ago

SeanBryan51 commented 3 months ago

Note the following was written in the context of #238.

I have done a sanity check to verify results are identical across different intel compiler versions so that we can eliminate the separate compiler case for intel-llvm in build.bash and use the latest version of the intel compiler when $compiler == intel.

Model outputs were generated using the TRENDY_V12 configuration for a reduced domain (extent="64.0,66.0,60.0,62.0" in run_TRENDY.sh). Model outputs were compared using nccmp -d <file_a> <file_b>. The model outputs being compared are all cru_out_cable_1901_2022.nc and cru_out_casa_1901_2022.nc outputs (as these two files are the final products) for each serial CABLE run.

In short, I found no differences when using ifort (IFORT) 2021.8.0 20221119 vs ifort (IFORT) 19.0.5.281 20190815. However, I found large differences (greater than 10%) across multiple variables (e.g. see Ebal plot below) when enabling the optimisation flags -march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS -diag-disable=15009 for the latest Intel compiler (intel-compiler-llvm/2023.0.0 module):

download-1

Curve A shows the run without the optimisation flags and curve B shows the run with the optimisation flags.

It is probably safe to upgrade the intel compiler to use the intel-compiler-llvm/2023.0.0 module (ifort (IFORT) 2021.8.0 20221119) but I can't say the same for the architecture specific optimisations.

Originally posted by @SeanBryan51 in https://github.com/CABLE-LSM/CABLE/issues/238#issuecomment-2134450045

SeanBryan51 commented 3 months ago

Note, it looks like CABLE-POP_TRENDY initialises the energy balance to zero at the start of each run and updates the value iteratively over time which is why the two plots seem to start from the same value despite being initialised with different restarts:

download-1

Plot A is the 1700_1900 run and plot B is the 1901_2022 run.

Whyborn commented 3 months ago

I had a look at some of the other variables, using the first climate spinup stage as the test stage. I found the same as you for the energy balance, but the other variables didn't show any difference (I didn't test for bitwise equivalence, but they are qualitatively the same). Here's a snapshot of some of the variables I picked, the long and shortwave radiation absorption, which I thought would play pretty strongly into the energy balance, and the water balance. Stage-1-climate_restart-Wbal Stage-1-climate_restart-SWnet Stage-1-climate_restart-LWnet

SeanBryan51 commented 2 months ago

I have created four separate executables of CABLE, each enabling one of the offending compiler flags (-march=broadwell -axSKYLAKE-AVX512,CASCADELAKE,SAPPHIRERAPIDS -diag-disable=15009) to narrow down which options cause the divergence. Executables which broke/preserved binary equivalence with the unoptimised run are listed below:

  1. -march=broadwell + default release flags: breaks binary equivalence with unoptimised run
  2. -axSKYLAKE-AVX512 + default release flags: breaks binary equivalence with unoptimised run
  3. -axCASCADELAKE + default release flags: breaks binary equivalence with unoptimised run
  4. -axSAPPHIRERAPIDS + default release flags: preserves binary equivalence with unoptimised run.

Note: cases where binary equivalence was broken are still bitwise reproducible.

mcuntz commented 2 months ago

That is indeed intriguing. Does that mean anything? I mean does that mean that there is a little inconsistency in the code somewhere or is it just a "compiler thing"? Can one actually switch on/off specific instruction sets used? Something like -axSSE3 to further narrow down the culprit? With the old Cray compiler, one could see what the compiler was rewriting when it was optimising (that was for vectorisation). Perhaps one can also "see" what the Intel compiler is rewriting.