0.25 degree configuration with a different MOM6 parameterization compared to #101

minghangli-uni commented 7 months ago

MOM_input

Most of the updates are sourced from discussions in namelist-discussion and referencing OM2 technical report. Some major updates are highlighted below,

surface boundary layer parameterisation:
- Instead of KPP used in CVmix project, current configuration implements an energetic constrained parameterisation of the surface boundary layer (EPBL), providing vertically diffusivity and viscosity, and the depth of active mixing (BL thickness).
Removal of mesoscale eddy mixing parametersiations (e.g., USE_MEKE=False)
Set NK=50.
No parameters associated with TIDES
NUM_DIAG_COORDS=2 includes z_star and rho_2 (not sure if rho_2 is relevant for the current 0.25deg(https://github.com/COSIMA/MOM6-CICE6/issues/40#issuecomment-1996247784))
Timesteps
- baraclinic timestep DT: 1350s
- tracer timestep DT_THERM: 1350s
- coupling timestep: 1350s
- ice thermodynamic timestep: 1350s

With THERMO_SPANS_COUPLING = True, tracer timestep can be integer multiple of DT. However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep. So we may think about increasing the coupling timestep beyond 1350s (A good question proposed by @dougiesquire https://github.com/COSIMA/MOM6-CICE6/pull/48#discussion_r1550916267)

I think we maybe need to discuss whether we really want to be changing the coupling timestep here (and if so, what we want to do with MOM's thermodynamic timestep).

tracer timestep DT_THERM: 2700s can lead to a speedup of 20% for each model year. (The comparison is conducted with DIABATIC_FIRST = False)

Ice initial condition

The ice initial condition is set to "default". Additionally, another experiment with the ice initial condition (following https://github.com/COSIMA/access-om3/issues/50) from a 3-hour run of OM2 is running at the same time.

Other params

All the other parameters or namelists remain consistent and up-to-date to https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss101 and OM2 technical report.

For example in ice_in,

&domain_nml
  block_size_x = 30
  block_size_y = 27
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 8
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"

where block_size_x = 30, block_size_y = 27 are consistent with OM2 technical report. max_blocks=8 is evaluated by this snippet,

     if (max_blocks < 1) then
       max_blocks=( ((nx_global-1)/block_size_x + 1) *         &
                    ((ny_global-1)/block_size_y + 1) - 1) / nprocs + 1
       max_blocks=max(1,max_blocks)
       write(nu_diag,'(/,a52,i6,/)') &
         '(ice_domain): max_block < 1: max_block estimated to ',max_blocks
     endif

GADI consumption:

When running a model year using OM2 with a time step of DT=1350s, and OM3 with DT_THERM=DT=1350s, the service units required using OM2 and OM3 are approximately 8.39KSU and 11.2KSU, respectively. This indicates that the current OM3 is slower than OM2 by 33%.
However, when OM3 is configured with DT_THERM=2*DT=2700s, the service units required using OM3 (8.2KSU) become comparable to those of OM2 (8.39KSU).

Limitations

The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.

The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135

aekiss commented 7 months ago

Thanks @minghangli-uni for documenting those details. Is there a branch or draft PR you can link to with these changes?

adele-morrison commented 7 months ago

Have you tried running with DT_THERM > 2700? Seems like it could be a good way to improve performance, especially when we start running with BGC.

Regarding the comment above,

However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep.

There is a follow on from this in MOMinput saying "unless THERMO_SPANS_COUPLING is true, in which case DTTHERM can be an integer multiple of the coupling timestep". So I think it's fine to have DT_THERM > dt_cpld.

e.g. GFDL OM4_025 uses:

dt_cpld = 3600
DT = 900.0
DT_THERM = 7200.0
THERMO_SPANS_COUPLING = True

Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.

minghangli-uni commented 7 months ago

The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135

dougiesquire commented 7 months ago

@adele-morrison, were your issues with DT_THERM related to the open boundaries?

adele-morrison commented 7 months ago

Yes, we only had a problem with DT_THERM in regional cases. I think large DT_THERM in global should be fine.

On 10 Apr 2024, at 10:25 am, Dougie Squire @.***> wrote:

@adele-morrison, were your issues with DT_THERM related to the open boundaries? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

minghangli-uni commented 7 months ago

@adele-morrison

Have you tried running with DT_THERM > 2700?

I havent tried but I am planning to run a test with 4 and 8 times greater than DT.

e.g. GFDL OM4_025 uses: dt_cpld = 3600 DT = 900.0 DT_THERM = 7200.0 THERMO_SPANS_COUPLING = True Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.

I've tried multiple tests. I observed that regardless of changes in other timesteps or the value of ntdt being 1 or 2, two errors consistently occurred in the first two years when dt_cpld was greater than or equal to 1800s.

FATAL from PE 100: write energy: Ocean velocity has been truncated too many times
(abort ice) error = (diagnostic abort) ERROR: bad departure points

dougiesquire commented 7 months ago

@minghangli-uni, can you easily test the timing and compare the outputs of runs with longer tracer timesteps? E.g. DT_THERM = 5400.0, 6750.0, 8100.0, keeping the MOM baraclinic timestep, CICE thermodynamic timestep and coupling timestep all set to 1350s.

minghangli-uni commented 7 months ago

Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.

dougiesquire commented 7 months ago

Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.

Are you saying that you think the model won't run any faster, or that you will do the runs to check (or something else)?

minghangli-uni commented 7 months ago

I am currently investigating the core cap issue. Additionally, I plan to examine whether increasing DT_THERM will impact physical fields. Based on these findings, we will determine the extent to which we can achieve a speedup.

dougiesquire commented 7 months ago

I am currently investigating the core cap issue.

What is this?

minghangli-uni commented 7 months ago

Limitations The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.

Look the above.

dougiesquire commented 7 months ago

@minghangli-uni some of the changes in the branch you've linked to this issue overlap with changes that are being made in https://github.com/COSIMA/MOM6-CICE6/pull/48 (i.e. the changes to NK, the timesteps, the CICE initial conditions and CICE block sizes are all done there). If you think the changes being proposed in that PR are not correct, please review/comment in that PR.

I think your other changes either require testing and/or require the same change across many repos. These are best handled one at a time. I suggest:

Open an issue for each proposed change
Address each change in a dedicated branch and associated PR

minghangli-uni commented 7 months ago

@dougiesquire This is a good point. Will follow your suggestion and implement changes accordingly.

aekiss commented 7 months ago

Are you planning to use @micaeljtoliveira's profiling tools for the test runs?

minghangli-uni commented 7 months ago

I will firstly work out the concurrent run and do test runs with increased CPU cores for MOM. This will reduce turn-around time and achieve results in a shorter walltime. Then I will use profiling tools (https://github.com/COSIMA/om3-utils/tree/profiling) to fine tune the optimal process layout for 025 deg configuration.

dougiesquire commented 6 months ago

@minghangli-uni can this be closed now or is there still a reason to keep it open?

minghangli-uni commented 6 months ago

I am happy to close it now. Thanks @dougiesquire

COSIMA / access-om3