Closed minghangli-uni closed 6 months ago
Thanks @minghangli-uni for documenting those details. Is there a branch or draft PR you can link to with these changes?
Have you tried running with DT_THERM > 2700? Seems like it could be a good way to improve performance, especially when we start running with BGC.
Regarding the comment above,
However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep.
There is a follow on from this in MOMinput saying "unless THERMO_SPANS_COUPLING is true, in which case DTTHERM can be an integer multiple of the coupling timestep". So I think it's fine to have DT_THERM > dt_cpld.
e.g. GFDL OM4_025 uses:
Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.
The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135
@adele-morrison, were your issues with DT_THERM
related to the open boundaries?
Yes, we only had a problem with DT_THERM in regional cases. I think large DT_THERM in global should be fine.
On 10 Apr 2024, at 10:25 am, Dougie Squire @.***> wrote:
@adele-morrison, were your issues with DT_THERM related to the open boundaries? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
@adele-morrison
Have you tried running with DT_THERM > 2700?
I havent tried but I am planning to run a test with 4 and 8 times greater than DT.
e.g. GFDL OM4_025 uses: dt_cpld = 3600 DT = 900.0 DT_THERM = 7200.0 THERMO_SPANS_COUPLING = True Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.
I've tried multiple tests. I observed that regardless of changes in other timesteps or the value of ntdt
being 1 or 2, two errors consistently occurred in the first two years when dt_cpld
was greater than or equal to 1800s.
FATAL from PE 100: write energy: Ocean velocity has been truncated too many times
(abort ice) error = (diagnostic abort) ERROR: bad departure points
@minghangli-uni, can you easily test the timing and compare the outputs of runs with longer tracer timesteps? E.g. DT_THERM = 5400.0, 6750.0, 8100.0
, keeping the MOM baraclinic timestep, CICE thermodynamic timestep and coupling timestep all set to 1350s.
Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.
Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.
Are you saying that you think the model won't run any faster, or that you will do the runs to check (or something else)?
I am currently investigating the core cap issue. Additionally, I plan to examine whether increasing DT_THERM will impact physical fields. Based on these findings, we will determine the extent to which we can achieve a speedup.
I am currently investigating the core cap issue.
What is this?
Limitations The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.
Look the above.
@minghangli-uni some of the changes in the branch you've linked to this issue overlap with changes that are being made in https://github.com/COSIMA/MOM6-CICE6/pull/48 (i.e. the changes to NK
, the timesteps, the CICE initial conditions and CICE block sizes are all done there). If you think the changes being proposed in that PR are not correct, please review/comment in that PR.
I think your other changes either require testing and/or require the same change across many repos. These are best handled one at a time. I suggest:
@dougiesquire This is a good point. Will follow your suggestion and implement changes accordingly.
Are you planning to use @micaeljtoliveira's profiling tools for the test runs?
I will firstly work out the concurrent run and do test runs with increased CPU cores for MOM. This will reduce turn-around time and achieve results in a shorter walltime. Then I will use profiling tools (https://github.com/COSIMA/om3-utils/tree/profiling) to fine tune the optimal process layout for 025 deg configuration.
@minghangli-uni can this be closed now or is there still a reason to keep it open?
I am happy to close it now. Thanks @dougiesquire
MOM_input
Most of the updates are sourced from discussions in namelist-discussion and referencing OM2 technical report. Some major updates are highlighted below,
USE_MEKE=False
)NK=50
.TIDES
NUM_DIAG_COORDS=2
includesz_star
andrho_2
(not sure ifrho_2
is relevant for the current0.25deg
(https://github.com/COSIMA/MOM6-CICE6/issues/40#issuecomment-1996247784))DT
: 1350sDT_THERM
: 1350sWith
THERMO_SPANS_COUPLING = True
, tracer timestep can be integer multiple ofDT
. However, as is mentioned in comments withinMOM6_input
,DT_THERM
should be less than coupling timestep. So we may think about increasing the coupling timestep beyond 1350s (A good question proposed by @dougiesquire https://github.com/COSIMA/MOM6-CICE6/pull/48#discussion_r1550916267)DT_THERM
: 2700s can lead to a speedup of 20% for each model year. (The comparison is conducted withDIABATIC_FIRST = False
)Ice initial condition
The ice initial condition is set to "default". Additionally, another experiment with the ice initial condition (following https://github.com/COSIMA/access-om3/issues/50) from a 3-hour run of OM2 is running at the same time.
Other params
All the other parameters or namelists remain consistent and up-to-date to https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss101 and OM2 technical report.
For example in
ice_in
,where
block_size_x = 30, block_size_y = 27
are consistent with OM2 technical report.max_blocks=8
is evaluated by this snippet,GADI consumption:
DT=1350s
, and OM3 withDT_THERM=DT=1350s
, the service units required using OM2 and OM3 are approximately 8.39KSU and 11.2KSU, respectively. This indicates that the current OM3 is slower than OM2 by 33%.However, when OM3 is configured with
DT_THERM=2*DT=2700s
, the service units required using OM3 (8.2KSU) become comparable to those of OM2 (8.39KSU).Limitations
The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.
The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135