Open jdha opened 1 year ago
Update:
run.stat
reproducibility issues using master with gnu-mpich at -O2, -O1 and -O0; also with -finit-local-zero. Tried with both 68 and 19 node options.
Next step: re-run tests with Cray
Same happens with the Cray compiler. A quick test with the GYRE test config looks to be fine.
I had another look at the reproducibility issue yesterday and figured out it was in the ice code. I had thought I’d initialised all variables to zero – but as I’d only done this test with GNU I thought I’d check with the Cray compiler. I recompiled and it was reproducible. So I guess I don’t know my compiler options very well or at least don’t know how to look them up!
For gfortran I used: -finit-local-zero … but maybe I’ve missed something. For the Cray I used -e0 and this seemed to solve my issue. Not sure it’s worth me going through the array of metO modifications to the ice code to see what’s going on as we’re going to move to 4.2.1 soon.
@jdha do you know if it is v4 or v4.2 that doesn't initialise to zero correctly? If the latter then it could be worth flagging. Also note that a bug has been discovered in the v4.2 ice-ocean drag that may be important (https://forge.nemo-ocean.eu/nemo/nemo/-/issues/333) . It is considered important enough to trigger a 4.2.2 release (very soon). In the meanwhile it's 2 lines in iceupdate.F90, and I've added it to the NPD repo.
@atb299 I'm naively assuming it's a GO8 thing as both 4.0.4 and 4.2.1 pass SETTE (but that doesn't mean all code is fully tested)
another quick look at this:
not found the source yet but if you set ln_pnd_alb
in namelist_ice_cfg_template
to false
you should get reproducibility
I've also tested that if ln_pnd_alb
is true
and you set zafrac_pnd = 0._wp
in place of zafrac_pnd = MIN( pafrac_pnd(ji,jj,jl), 1._wp - zafrac_snw )
on ln 132 of icealb.F90
you can also get reproducibility
there are two calls to ice_alb
:
ice_update_flx
and ice_sbc_flx
it appears that the reproducibility issue occurs in the latter - although from what I can see all inputs have been initialised
For info: repeating this test with the 4.2.1 code passes the run.stat test
run.stat for master (4.0.4) and 4.2 branch (using 4.0.4) are identical for a short run of 20 time step (with ln_pnd_alb=.false.,
)
4.0.4 run.stat from master does not match 4.0.4 run.stat from 4.2 branch