Open isabellaascione opened 5 months ago
Changes identified with NEMO5 beta release:
NEMO-ocean parameters
parameter status at NEMO5 beta | GOSI10 at NEMO5 beta | GOSI10 Trial at NEMO 4.2.2 , for comparison with GOSI10 at NEMO5 beta | GOSI10 at NEMO 4.2.2 (standard) | |
---|---|---|---|---|
namrun | ||||
ln_top | $${\color{green}new}$$ | TRUE (but used only with key_top) | not available | not available |
ln_rst_eos | $${\color{gray} N/A }$$ (GOSI10 only) | TRUE (ported from GOSI10 package) | TRUE | TRUE |
namdom | ||||
ln_linssh | $${\color{red}removed}$$ | N/A | FALSE | FALSE |
ln_shuman | $${\color{green}new}$$ | FALSE | not available | not available |
namlbc | ||||
ln_shlat2d | $${\color{gray} N/A}$$ (GOSI10 4.2.2 only) | N/A (not ported) | FALSE | TRUE |
rn_shlat | 0 | 0 | -9999 | |
namsbc | ||||
ln_dm2dc | $${\color{blue}preexisting}$$ | FALSE | FALSE | TRUE |
namsbc_blk | ||||
ln_MFS | $${\color{green}new}$$ | FALSE | not available | not available |
ln_prec_met | $${\color{green}new}$$ | FALSE | not available | not available |
rn_Cd_ia | $${\color{orange}renamed}$$ | rn_Cd_ia=1.4e-3 | rn_Cd_i=1.4e-3 | rn_Cd_i=1.4e-3 |
rn_Ce_ia | $${\color{orange}renamed}$$ | rn_Ce_ia=1.4e-3 | rn_Ce_i=1.4e-3 | rn_Ce_i=1.4e-3 |
rn_Ch_ia | $${\color{orange}renamed}$$ | rn_Ch_ia=1.4e-3 | rn_Ch_i=1.4e-3 | rn_Ch_i=1.4e-3 |
ln_Cx_ice_frm | $${\color{green}new}$$ | false | not available | not available |
nn_frm | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
rn_Cs_io | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
rn_Cs_ia | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
rn_Cr_ia | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
rn_Cr_io | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
rn_Cf_ia | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
rn_Cf_io | $${\color{green}new}$$ | set to default as per reference namelist, but not used because ln_Cx_ice_frm=.false.) | not available | not available |
namsbc_abl | ||||
ln_pga_abl | $${\color{green}new}$$ | false (only for ln_abl = T) | not available | not available |
rn_vfac | $${\color{green}new}$$ | (only for ln_abl = T) | not available | not available |
namsbc_blk | ||||
ln_crt_fbk | $${\color{blue}preexisting}$$ | TRUE | TRUE | FALSE |
rn_stau_a | $${\color{blue}preexisting}$$ | -2.90E-03 | -2.90E-03 | not used |
rn_stau_b | $${\color{blue}preexisting}$$ | 8.00E-03 | 8.00E-03 | not used |
rn_vfac | $${\color{gray} N/A}$$ (GOSI10 4.2.2 only) | N/A (not ported) | 0 | 1 |
namtra_qsr | ||||
nn_chldta | $${\color{blue}preexisting}$$ | 1 | 1 | 0 |
rn_chl_conc | $${\color{gray} N/A }$$ (GOSI10 4.2.2 only) | N/A (not ported) | not used | 0.1 |
nameos | ||||
rn_T0 | $${\color{green}new}$$ | 10 | hardcoded to 10 | hardcoded to 10 |
rn_S0 | $${\color{green}new}$$ | 35 | hardcoded to 35 | hardcoded to 35 |
namtra_adv | ||||
nn_fct_imp | $${\color{green}new}$$ | it should be 2 but that does not work , set to 1 | not available | not available |
namtra_eiv | ||||
ln_eke_equ | ||||
namldf_eke | $${\color{green}new}$$ namelist with new parameters (not listed in this table) | unused because nn_aei_ijk_t=21 | not available | not available |
namvvl | ||||
nn_vvl_interp | $${\color{blue}preexisting}$$ | 2 | 2 | 0 |
namdyn_adv | ||||
ln_dynadv_ubs | $${\color{red}removed}$$ | N/A | FALSE | FALSE |
namdyn_vor | ||||
ln_dynvor_eeT | $${\color{red}removed}$$ | N/A | FALSE | FALSE |
namdyn_hpg | ||||
ln_hpg_zps | $${\color{red}removed}$$ | N/A | FALSE | FALSE |
namdyn_spg | ||||
ln_bt_av | $${\color{red}removed}$$ now it is hardcoded to TRUE (ll_bt_av in dynspg_ts.F90) | N/A | TRUE | TRUE |
rn_htau_scaling | $${\color{gray} N/A}$$ (GOSI10 4.2.2 only) | ported | 1 | 1 |
nammpp | ||||
ln_mppdelay | $${\color{green}new}$$ | FALSE | not available | not available |
nn_hls | $${\color{blue}preexisting}$$ | 2 | 1 (should be 2, but does not have much impact ) | 1 |
namzdf | ||||
ln_zdfiwm | $${\color{blue}preexisting}$$ | TRUE | TRUE | FALSE |
ln_zdftmx | $${\color{gray}N/A }$$ (GOSI10 4.2.2 only) | N/A (not ported) | FALSE | TRUE |
rn_avm0 | $${\color{blue}preexisting}$$ | 1.40E-06 | 1.40E-06 | 1.20E-04 |
rn_avt0 | $${\color{blue}preexisting}$$ | 1.00E-10 | 1.00E-10 | 1.20E-05 |
Sea-Ice parameters
parameter | parameter status in NEMO5 beta | GOSI10 at NEMO5 beta | GOSI10 trial at NEMO 4.2.2 (for comparison with GOSI10 at NEMO5 beta) | GOSI10 at NEMO 4.2.2 (standard) |
---|---|---|---|---|
rn_cio= 1.0e-2 | $${\color{orange}renamed}$$ | rn_Cd_io= 1.0e-2 | rn_cio= 1.0e-2 | rn_cio= 1.0e-2 |
ln_icedS | $${\color{red}removed}$$ | N/A | ln_icedS=.true. | ln_icedS=.true. |
ln_flushing | $${\color{green}new}$$ | TRUE | Not available as option in the namelist but included in the code | Not available as option in the namelist but included in the code |
ln_drainage | $${\color{green}new}$$ | TRUE | Not available as option in the namelist but included in the code | Not available as option in the namelist but included in the code |
rn_sinew | $${\color{green}new}$$ | when using nn_icesal=2, initially set to 0.3 as per recommendation for NEMO5beta. then set to 0.25 as per reccomendation for NEMO5rc. rn_sinew *sss_1d is used instead of rn_simax, so will give different result from 4.2.2 | N/A | N/A |
rn_simax | $${\color{red}removed}$$ | N/A | 20 | 20 |
rn_sal_himin | $${\color{green}new}$$ | 0 | Hardcoded to 0. | Hardcoded to 0. |
ln_pnd_rain | $${\color{green}new}$$ | FALSE | N/A | N/A |
rn_pnd_hl_min | $${\color{green}new}$$ | 0.005 | Not available as parameter, but hardcoded to 0.005 | Not available as parameter, but hardcoded to 0.005 |
rn_pnd_hl_max | $${\color{green}new}$$ | 0.015 | Not available as parameter, but hardcoded to 0.015 | Not available as parameter, but hardcoded to 0.015 |
rn_alb_lpnd | $${\color{green}new}$$ | set to 0.36, it should do the same as 4.2.2 | N/A | N/A |
nn_liquidus | $${\color{green}new}$$ | 1 | N/A | N/A |
nn_pnd_brsal | $${\color{gray} N/A }$$ (GOSI10 4.2.2 only ) | N/A, use nn_liquidus instead | 0 | 0 |
BUG
the parameter nn_fct_imp has been introduced with the 5-beta release. The default value is 1. To replicate the behaviour of 4.2.2, it should be set to 2. However this option seems to be broken. I let Daley know about this, and an issue should be opened on the main nemo repository.
update: Daley proposed a fix: https://forge.nemo-ocean.eu/nemo/nemo/-/issues/499
The r5282_nemo5rc_changes_and_fix_intel_xios MOCI branch should be used to run GOSI10 after it has been rebased onto the NEMO 5 release candidate (currently it is a branch of the beta).
This is required because the NEMO 5 release candidate has removed lib_cray.f90
, necessitating an update to the fcm-make
configurations in MOCI. The above branch contains these changes as well as those from r4684_fix_intel_xios, which we currently use.
Suites using GOSI10 based on the NEMO 5 release candidate will need the following update in app/fcm_make_ocean/rose-app.conf
:
config_root_path=fcm:MOCI.xm/branches/dev/daleycalvert/r5282_nemo5rc_changes_and_fix_intel_xios
nemo_version_file=nemo5.0-version.cfg
The NEMO 5 RC is released.
moving from NEMO5-beta to NEMO5-RC ,
the new reference namelist says to use 0.25 instead of 0.3 (as in NEMO5 beta) when using nn_icesal=2
Additionally, the following changes happened to the namelists:
namhsb | removed namelist |
---|---|
ln_diahsb | removed |
namdom | |
ln_crs | removed |
namcfg | |
ln_use_jattr | removed |
namcrs | removed |
nn_factx | removed |
nn_facty | removed |
nn_binref | removed |
ln_msh_crs | removed |
nn_crs_kz | removed |
ln_crs_wn | removed |
namc1d | it now includes the parameters from the namelists namc1d_dyndmp and namc1d_uvd, which were present in 5-beta. namc1d_dyndmp and namc1d_uvd no longer exist as namelists |
nam_tide | |
ln_tide_dia | removed |
namtra_eiv | |
ln_ldfeiv_dia | removed |
namflo | removed |
ln_floats | removed |
jpnfl | removed |
jpnnewflo | removed |
ln_rstflo | removed |
nn_writefl | removed |
nn_stockfl | removed |
ln_argo | removed |
ln_flork4 | removed |
ln_ariane | removed |
ln_flo_ascii | removed |
nam_dia25h | removed |
ln_dia25h | removed |
namnc4 | |
ln_nc4zip | default is .false. |
I tested the GOSI10 p2.0 with ORCA025. In the run with NEMO 5 beta I was using rn_sinew=0.3 with nn_icesal=2, as recommended in the NEMO5-beta release. The run crashed after about 25 years, in July 2001, with errors in outputting variable sfxice.
looking into the code, the nn_icesal =2 option is no longer coded as it was in NEMO 4.2.2. I did 1 day run test and compared the variable sfxice with the reference at GOSI10 p2.0 . The results (below, picture to the left) show that the nn_icesal=2 with rn_sinew=0.3 produce different results from GOSI10 p2.0 (middle picture).
I backported the formulas used in NEMO 4.2.2 into the branch (https://forge.nemo-ocean.eu/devs/ukmo/gosi/gosi/-/commit/b5e4620c40b9fc064630b0007a22d58c5effa9b8) . With this change, nn_icesal=2 option no longer uses the parameterisation based on rn_sinew.
the results in the picture below show that the sfxice variable (right picture) is consistent with GOSI10p2.0 (picture in the middle) when using the formulation from NEMO4.2.2
With NEMO5 RC the recommendation is now to use rn_sinew=0.25 with nn_icesal=2.
Emma also advised to try nn_icesal=4, which uses a default of rn_sinew=0.75 in the reference namelist.
Following the above, I set up the following 3 long runs:
suite | description | outcome |
---|---|---|
u-di347 | GOSI10pX.0 (based on NEMO5RC). nn_icesal=2 and rn_sinew=0.25 | this run crashed shortly after 20 years , in April 1997, with error NetCDF: Numeric conversion not representable Unable to write data given the location id: 65536 and the variable whose id: 58 and name: sfxice |
u-di581 | GOSI10pX.0 adapted to NEMO5RC , using nn_icesal=4 and rn_sinew=0.75 | crashes on 19890601 wuh same error as above |
u-di568 | GOSI10pX.0 adapted to NEMO5RC , but backporting the formulations used in NEMO 4.2.2. Using nn_icesal=2 and re-introducing rn_simax=20.0 . rn_sinew is not used in this run | crashes on 19890601 wuh same error as above |
@emmafiedler
@edblockley
@alex.west49
NEMO system meeting on 04/09/2024
there has been discussion about validation effort. the full minutes of the meeting can be found at (login required): https://forge.nemo-ocean.eu/nemo/nst/-/wikis/meetings/2024_09_4th
Notes for the validation efforts (from the minutes):
CNRS:
ran ORCA1 for 120 years for v4.2 (MLF) and v5.0 (MLF and RK3). Found an issue with sea-ice quantities when using RK3: under investigation (eORCA1 monitoring here).
no particular issue found in coupled ocean-atmosphere global and regional configurations with NEMOv5.0 (But not check MLF vs RK3).
Used XIOS3 following the instructions of Andrew on the draft user guide (here and it worked really well)
NOC:
ORCA025, not noticed any problems for first 5 years.
CMCC:
Did not notice any issues for Black & Mediterranean sea regional configs during the first year of simulation wave coupling will be validated soon
UKMET:
GOSI10 config: ORCA1,ORCA025 and ORCA12 grids (not exactly identical to v5.0 RC tagged version) crash after 20 years integration with ORCA025 using MLF (and delayed communications disabled)
similar AMOC between NEMO 4.2 and 5.0
SSS differences dipole between Artic & North Atlantic
global scores improvement with Nemo 5
restartability issue to be investigated by Daley; it can be linked to the crash Clem suggests to have a look at ice namelists between Nemo 4.2 & 5.0 and to check salt-ice related options
NEMO system meeting on 25/09/2024
the full meeting minutes are available at (login required):
https://forge.nemo-ocean.eu/nemo/nst/-/wikis/meetings/2024_09_25th
notes related to validation efforts:
LOCEAN: Need to re-run the simulations to double check as the results are curious and not reproducible by the other groups.
NOC: run BENCH (eORCA12 like, no output, north fold) on ARCHER2: In terms of performance results are very good (still have benefit up to 60 000 cores) North fold still costing a lot => leads to up and down in the scalability curve. This depends of the number of lines in the north fold subdomains. Some work is still needed to improve that.
MetO: 25 years, then instability possibly in SI3. Not able to restart beyond because of the instability. On the 25 years of simulation, generally more ice in NEMO 5.0 compared to NEMO 4.2. Instabilities sensitive to compilation options (fp-model). XIOS failed but because SI3 send too large value. This is a model error not a XIOS error. S.T offered to look into the issue.
Issue with RK3: reconsidered the way Coriolis forces are computed in the time splitting loop. The branch is there and needs to be validated by others. (S.T. and G.M.). It improves stability of ORCA2. The issue is there for RK3 and MLF.
This is a summary of my investigation into the issue described in Isabella's comment above.
I was able to replicate the issue in an experiment without the GOSI10 code (so just using the NEMO 5.0 release candidate). The following findings are based on this configuration.
The following error:
NetCDF: Numeric conversion not representable Unable to write data given the location id: 65536
and the variable whose id: 58 and name: sfxice
occurs when XIOS is unable to convert data to the precision required by the netCDF file when writing to it. This usually means that the data contains NaNs, infinite values, or very large/small values.
In this case, the reported diagnostic (sfxice
) uses data from the SI3 salt flux sfx
, which did indeed have very large values at some point during the cycle that crashed.
The NEMO grid is split into two parts:
In a global ORCA configuration, the east and west boundaries are periodic, the south boundary is closed and the north boundary is periodic with a north fold. The following figure shows a schematic of the north boundary for the U grid on a T-point north fold:
In this figure, the internal domain is the area below the thick black line between rows jpj-1
and jpj-2
. The halo points are copies of this data (compare cells with the same colours), mirrored around the central columns (jpiglo/2
and jpiglo/2+1
) and across the top row of the internal domain (jpj-2
) so that the sign is changed- this is the north fold.
This particular north fold type (mirrored along T grid points) means that the right half of the top row of the inner domain contains duplicate data from the left half- this is shown by the light grey shading. I will refer to this specific part of the north fold as the "internal north fold" below.
The internal north fold is not normally updated by lbc_lnk
, i.e. the left half of row jpj-2
is not copied onto the right half, as it is part of the internal domain. This assumes that the results of the calculation will be identical on both halves of the row, but this isn't necessarily true. In fact, this point is acknowledged within the NEMO code itself:
! In theory we should exchange only nn_hls lines.
!
! However, some other points are duplicated in the north pole folding:
! - c_NFtype='T', grid=T : half of the last line (jpiglo/2+2:jpiglo-nn_hls)
! - c_NFtype='T', grid=U : half of the last line (jpiglo/2+1:jpiglo-nn_hls)
! - c_NFtype='T', grid=V : all the last line nn_hls+1 and (nn_hls+2:jpiglo-nn_hls)
! - c_NFtype='T', grid=F : all the last line (nn_hls+1:jpiglo-nn_hls)
! - c_NFtype='F', grid=U : no points are duplicated
! - c_NFtype='F', grid=V : half of the last line (jpiglo/2+1:jpiglo-nn_hls)
! - c_NFtype='F', grid=F : half of the last line (jpiglo/2+1:jpiglo-nn_hls-1)
! The order of the calculations may differ for these duplicated points (as, for example jj+1 becomes jj-1)
! This explain why these duplicated points may have different values even if they are at the exact same location.
! In consequence, we may want to force the folding on these points by setting l_full_nf_update = .TRUE.
! This is slightly slower but necessary to avoid different values on identical grid points!!
This characteristic of the internal north fold is relevant in the next section, which describes the source of the issue.
The large sfx
values were tracked down to large sea ice velocities (~45m/s) calculated by the SI3 EVP rheology scheme (icedyn_rhg_evp.F90
) that consistently appeared on the internal north fold. By comparison, the corresponding points on the other side of the north fold had much more realistic velocities. These differences manifest quickly- within 120 solver substeps on the first timestep- and the velocities on the internal north fold grow over subsequent timesteps.
These differences can be eliminated by adding the ldfull=.TRUE.
argument to the lbc_lnk
calls for u_ice
and v_ice
in icedyn_rhg_evp.F90
. This argument causes data on the internal north fold to be updated from data on the other side of the north fold. Testing shows that this is sufficient to prevent the growth of instabilities in the calculation of u_ice
and v_ice
(using the global maximum Courant number as a measure).
In summary, the calculations underpinning u_ice
and v_ice
give very slightly different results on the internal north fold, compared to those on the other side of the north fold. The next section outlines a theory as to why.
It isn't clear to me why these differences would amplify over time- perhaps someone more familiar with the numerical aspects of SI3 would have an idea?
Floating point arithmetic is not exact- a+b
is in practice equal to a+b+E(a,b)
where E
is a rounding error. Crucially, this means that floating point arithmetic is usually neither associative nor distributive. For example, a+b+c
requires two operations to calculate, which could be represented by the compiler as either (a+b)+c
or a+(b+c)
. Because the rounding error in each case is different, the results will not be exactly the same. For this reason, brackets may need to be inserted to ensure consistent results by fixing the order of operations.
Fused Multiply-Add operations (FMAs) are an optimisation that allow arithmetic of the form a*b+c
to be calculated using a single floating point operation. This is faster and more accurate than using two operations, i.e. (a*b)+c
, but is no less prone to giving inconsistent results.
For example, c = a**2 - b**2
could be calculated by the compiler as a series of 3 operations without FMAs: c = (a*a) - (b*b)
. With FMAs this is reduced to 2 operations- one of which is an FMA- but there is ambiguity in the order of operations:
d = a*a
c1 = d - b*b ! FMA
d = b*b
c2 = a*a - d ! FMA
c1
and c2
will not be exactly the same, again due to differing rounding errors in the floating point calculation. In fact, c1
or c2
may be negative even if a == b
.
The calculations of u_ice
and v_ice
on either side of the internal north fold give consistent results (and prevent the instability) when one of the following Intel compiler flags is used:
-O0
instead of -O2
-fp-model consistent
instead of -fp-model precise
-no-fma
Each of these flags disables FMAs, implying that FMAs are responsible for the differing results on either side of the north fold. Why? My theory is that this is due to a differing order of operations, similar to an issue we encountered when moving from 1 row to 2 rows of halo points in NEMO 4.2.
Take the following arithmetic:
c(ji,jj) = a(ji,jj-1) + a(ji,jj) + a(ji,jj+1)
This may be calculated by the compiler as:
c(ji,jj) = a(ji,jj-1) + ( a(ji,jj) + a(ji,jj+1) )
On the other side of the north fold the calculation will be mirrored, so that a(ji,jj-1)
and a(ji,jj+1)
values are swapped. The calculation then effectively becomes:
c(ji,jj) = a(ji,jj+1) + ( a(ji,jj) + a(ji,jj-1) )
This gives a slightly different result, because the order of operations has changed. Brackets must therefore be inserted into the expression to fix the order of operations and ensure consistent results on either side of the north fold.
lbc_lnk
copies data from the internal domain on one side of the north fold onto the halo points on the other side. This eliminates any difference in results on either side of the north fold. When moving to 2-point haloes, many lbc_lnk
calls were replaced by direct calculation on the halo points. This led to results changing and required brackets to be inserted at various points in the code to recover the previous results.
I believe the same thing is happening with FMAs in icedyn_rhg_evp.F90
- they are giving different results on either side of the north fold because the order of operations is different.
As discussed previously, brackets could be inserted to ensure a consistent order of operations, in this case with the aim of forcing the compiler to consistently implement FMAs in an arithmetic expression. However, this is not feasible for several reasons:
This issue is likely to be exclusive to our (the Met Office's) compiler and platform, meaning that it probably won't follow us onto the Gen 1 HPC (which does not have the Intel compiler). Therefore, a better solution is to use one of the Intel compiler flags that disable FMAs. This may slow the model down, but I think this would be an acceptable compromise for the short time we will be on the current HPC.
My suggestion is to use -fp-model consistent
(replacing -fp-model precise
) rather than -no-fma
, since the latter flag is set by the former, and the former flag is used by other groups in the NEMO consortium.
Several experiments based on my NEMO 5.0rc "GOSI10-like" suite (u-di687
) are currently running for 30 years, each of which is trialling a different fix. These experiments were all run from the same suite, but are being archived under different experiment IDs:
u-di687
- using the fix suggested to Isabella (call the lbc_lnk
for u_ice
/v_ice
in icedyn_rhg_evp.F90
with ldfull=.true.
, but only for the last substep of the solver)u-dj999
- using -fp-model consistent
instead of -fp-model precise
u-dk000
- as for u-di687
but call the lbc_lnk
with ldfull=.true.
for all substeps of the solverAll experiments failed at some point with an ice grid-point conservation issue. These conservation diagnostics are enabled with ln_icediachk=.true.
and check that spurious changes in ice mass, salt and heat do not exceed a given tolerance. Globally-integrated changes and changes within a grid cell have different tolerances, with the latter being more strict and raising an error while the former only prints a warning. Examples of these warnings are:
iceupdate : violation mass cons. [kg] = 32780928625.5002
iceupdate : violation heat cons. [J] = -4.232530907118238E+015
iceupdate : violation mass cons. [kg] = 25777135366.6723
iceupdate : violation heat cons. [J] = -4.261563321882954E+015
We do not normally enable these checks- I'm not sure why. I enabled them while looking into the instability issue since they served as a useful diagnostic. I kept them enabled in the above 30-year tests in the hope that the conservation warnings would be resolved by the fixes being tested. However, while these warnings have been significantly reduced in frequency- they now appear for a couple of months every 2-6 years rather than most months every year- they are still occurring. Eventually, the grid-point conservation checks stopped all 3 experiments.
I believe these conservation issues are another characteristic of the Intel compiler. I have been running GOSI9p8.1 (based on NEMO 4.0.4) with the Cray (u-dk107
) and Intel (u-dk108
) compilers. The Intel compiler outputs similar warnings from the sea ice conservation diagnostics (it reports negative enthalpies rather than spurious mass/heat), but the Cray compiler (the standard for GOSI9 suites) outputs no warnings at all.
Given that we will be moving to Gen 1 and away from the Intel compiler, it's likely that the conservation warnings will disappear. So I mention this as something to keep an eye on when we start running on Gen 1, but I suspect it will no longer be an issue.
I have continued u-di687
and u-dj999
without the sea ice conservation diagnostics for the sake of completing 30 years. These should tell us if there is any systematic difference between these two fixes in terms of scientific results, as well as whether -fp-model consistent
will significantly affect run times.
In the course of investigating the above issue, I also noticed that NEMO 5.0rc was not restartable- i.e. a 30-day run did not give the same results as 3 10-day runs. This issue has also been traced back to the use of FMAs and the same compiler flags resolve it, although I don't know whether the issue is specific to the Intel compiler.
As for the other issue, it's likely that this one won't follow us onto Gen 1, so I again recommend using -fp-model consistent
instead of -fp-model precise
in the compiler flags.
Preliminary results running GOSI10 based on NEMO5 rc : ValidationJMMP_update_Nov.pptx
full validation notes:
ocean: https://gws-access.jasmin.ac.uk/public/jmmp/valid_ocean/u-dk027_10y_vs_u-dh157_10y/assess.html
sea-ice: https://gws-access.jasmin.ac.uk/public/jmmp/valid_ice/u-dk027_10y_si3_vs_u-dh157_10y_si3/assess.html
[x] NEMO5 beta released in on https://forge.nemo-ocean.eu/devs/ukmo/gosi/gosi/-/commits/5.0-beta?ref_type=tags
[x] Create issue and branch on GOSI repository https://forge.nemo-ocean.eu/devs/ukmo/gosi/gosi/-/issues/19
[x] Port GOSI10 package changes to NEMO5 beta. GOSI10 will have different settings compared to standard GOSI10 at 4.2.2
[x] set up a rose suite to run with the modified version of GOSI10 at NEMO5 beta .
$${\color{red} ✗ }$$ run 30 years : the run crashed after 25 years with errors in writing the variable sfxice
$${\color{red} ✗ }$$ validation notes : these cannot be run for the last 10 years of the experiment due to the failure above described