Open aekiss opened 1 year ago
The latest commit on the 1deg_jra55do_ryf
branch of MOM6-CICE6 crashes after model date = 0001-10-12T00:00:00
with
WARNING from PE 31: Extreme surface sfc_state detected: i= 329 j= 194 lon= 48.500 lat= 26.524 x= 48.500 y
= 26.524 D= 1.1806E+01 SSH= 1.0551E+01 SST= 2.5600E+01 SSS= 4.5001E+01 U-= 0.0000E+00 U+=-1.0853E-02 V-= 0.000
0E+00 V+= 7.3564E-03
This is at the head of the Persian Gulf. This crash seems nearly identical to the previous test (same location and date, nearly the same SSH): https://github.com/COSIMA/MOM6-CICE6/pull/5#issuecomment-1676484356.
Run dir:
/home/156/aek156/payu/MOM6-CICE6-1deg_jra55do_ryf
changing DTBT
from -0.95 to -0.5 (roughly halving barotropic timestep) makes no difference
WARNING from PE 31: Extreme surface sfc_state detected: i= 329 j= 194 lon= 48.500 lat= 26.524 x= 48.500 y= 26.524 D= 1.1806E+01 SSH= 1.0589E+01 SST= 2.5610E+01 SSS= 4.5001E+01 U-= 0.0000E+00 U+=-1.0604E-02 V-= 0.0000E+00 V+= 6.4269E-03
Also crashes identically with the latest ACCESS-OM3 commit 377c1fc (unsurprising, as this just adds the GPTL timing library).
The 1deg_jra55do_ryf
and 1deg_jra55do_iaf
configs of MOM6-CICE6 run happily with more lenient surface checks, using values from mom6-om4-025/MOM_input
(RH column) instead of defaults (LH column):
Variable | archive/ output008/ MOM_parameter_doc.all |
archive/ output009/ MOM_parameter_doc.all |
---|---|---|
bad_val_ssh_max | 20.0 | 50.0 |
bad_val_sss_max | 45.0 | 75.0 |
bad_val_sst_max | 45.0 | 55.0 |
bad_val_sst_min | -2.1 | -3.0 |
MOM6-CICE6-WWIII configuration crashes at the same location as MOM6-CICE6 after running MOM Date 1/10/08 00:00:00
, The SSH and SST limit mentioned above is not implemented yet.
WARNING from PE 31: Extreme surface sfc_state detected: i= 329 j= 194 lon= 48.500 lat= 26.524 x= 48.500 y= 26.524 D= 1.1806E+01 SSH= 1.0461E+01 SST= 2.5886E+01 SSS= 4.5002E+01 U-= 0.0000E+00 U+=-4.0502E-02 V-= 0.0000E+00 V+=-2.2684E-02
Using more lenient checks from from mom6-om4-025 allows the MOM6-CICE6 1° run to proceed for at least 2 years with no issues.
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/namelist-configuration-discussion-meeting/1917/9
Maybe fixing this will help? https://github.com/COSIMA/access-om3/issues/164
Maybe fixing this will help? #164
Thanks @aekiss. I am currently experiencing crashes in the MOM6-CICE6 1 deg IAF and RYF configs (main branch) after a few months of runtime (3-4 months). Each failure appears to be due to different reasons (I have listed a few error logs below).
Test experiment 1 - IAF 1deg
WARNING from PE 0: diag_util_mod::opening_file: module/field_name (ocean_model_z/N2_int) NOT registered
WARNING from PE 0: diag_util_mod::opening_file: module/field_name (ocean_model_z/N2_int) NOT registered
WARNING from PE 0: diag_util_mod::opening_file: module/field_name (ocean_model_z/N2_int) NOT registered
Image PC Routine Line Source
libpthread-2.28.s 00001540FACF5CF0 Unknown Unknown Unknown
access-om3-MOM6-C 00000000039D6EBC diag_manager_mod_ 3234 diag_manager.F90
access-om3-MOM6-C 00000000039BDF18 diag_manager_mod_ 1466 diag_manager.F90
access-om3-MOM6-C 000000000350D06E mom_diag_manager_ 348 MOM_diag_manager_infra.F90
access-om3-MOM6-C 0000000003095BAE mom_diag_mediator 1784 MOM_diag_mediator.F90
access-om3-MOM6-C 0000000003094202 mom_diag_mediator 1625 MOM_diag_mediator.F90
access-om3-MOM6-C 00000000035A1AAD mom_dynamics_spli 1051 MOM_dynamics_split_RK2.F90
access-om3-MOM6-C 0000000002E49A33 mom_mp_step_mom_d 1173 MOM.F90
access-om3-MOM6-C 0000000002E4058B mom_mp_step_mom_ 853 MOM.F90
access-om3-MOM6-C 0000000002E1496D mom_ocean_model_n 633 mom_ocean_model_nuopc.F90
access-om3-MOM6-C 0000000002D3505D mom_cap_mod_mp_mo 1759 mom_cap.F90
Test experiment 2 - RYF 1 deg
WARNING from PE 0: diag_util_mod::opening_file: module/field_name (ocean_model_z/N2_int) NOT registered
WARNING from PE 0: diag_util_mod::opening_file: module/field_name (ocean_model_z/N2_int) NOT registered
WARNING from PE 0: diag_util_mod::opening_file: module/field_name (ocean_model_z/N2_int) NOT registered
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread-2.28.s 00001482944AFCF0 Unknown Unknown Unknown
access-om3-MOM6-C 00000000037C6D16 mom_vert_friction 1713 MOM_vert_friction.F90
access-om3-MOM6-C 0000000003597CC4 mom_dynamics_spli 581 MOM_dynamics_split_RK2.F90
access-om3-MOM6-C 0000000002E49A33 mom_mp_step_mom_d 1173 MOM.F90
access-om3-MOM6-C 0000000002E4058B mom_mp_step_mom_ 853 MOM.F90
access-om3-MOM6-C 0000000002E1496D mom_ocean_model_n 633 mom_ocean_model_nuopc.F90
access-om3-MOM6-C 0000000002D3505D mom_cap_mod_mp_mo 1759 mom_cap.F90
access-om3-MOM6-C 00000000020A73BF _ZNK5ESMCI13Metho 377 ESMCI_MethodTable.C
access-om3-MOM6-C 00000000020A7338 _ZN5ESMCI11Method 563 ESMCI_MethodTable.C
access-om3-MOM6-C 00000000020A5DBB c_esmc_methodtabl 317 ESMCI_MethodTable.C
access-om3-MOM6-C 0000000000DFD539 esmf_attachmethod 1287 ESMF_AttachMethods.F90
access-om3-MOM6-C 0000000004B83C92 nuopc_modelbase_m 2212 NUOPC_ModelBase.F90
@ezhilsabareesh8 is this crashing even with more lenient checks?
@ezhilsabareesh8 is this crashing even with more lenient checks?
Thanks @aekiss. With the recent changes of setting Z_INIT_REMAP_GENERAL = True
and MAX_DELTA_SRESTORE = 999.0
, the 1-degree MOM6-CICE6 IAF configuration is now running for 3 years without crashing, even without lenient checks.
Test experiment 2 - RYF 1 deg forrtl: error (78): process killed (SIGTERM) Image PC Routine Line Source libpthread-2.28.s 00001482944AFCF0 Unknown Unknown Unknown access-om3-MOM6-C 00000000037C6D16 mom_vert_friction 1713 MOM_vert_friction.F90 access-om3-MOM6-C 0000000003597CC4 mom_dynamics_spli 581 MOM_dynamics_split_RK2.F90
The RYF 1-degree MOM6-CICE6 configuration still crashes with the above error. However, there is a significant difference between the MOM_input
of the IAF and RYF configurations, which may be causing the error in RYF but not in IAF. The IAF MOM_input is outdated and needs to be updated.
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
variable | MOM_input_one_deg_RYF | MOM_input_one_deg_IAF -- | -- | -- adjust_net_srestore_to_zero | | TRUE ah_vel_scale | | 0 bbl_use_eos | | TRUE bt_thick_scheme | | FROM_BT_CONT cfc_bc_file | | cfc_atm_20230310.nc coord_config | | none debug | | FALSE default_2018_answers | | FALSE depth_scaled_khth | | FALSE energysavedays | | 1 eqn_of_state | | WRIGHT fatal_unused_params | TRUE | fix_ustar_gustless_bug | | TRUE gill_equatorial_ld | | TRUE grid_rotation_angle_bugs | | FALSE hmix_min | | 2 int_tide_decay_scale | | 300.3003003003003 interp_type2 | | LMD94 interpolate_res_fn | | FALSE kappa_shear_all_layer_tke_bug | | FALSE kappa_shear_iter_bug | | FALSE kdml | | 0 kh_vel_scale | | 0 khth | | 0 khth_max | | 0 khtr_max | | 0 mask_srestore_under_ice | | FALSE max_ent_it | | 20 max_rino_it | | 25 maxtrunc | | 0 min_salinity | | 0 nihalo | | 4 njhalo | | 4 prandtl_turb | | 1 remap_uv_using_old_alg | | FALSE simple_tke_to_kd | | TRUE smag_bi_const | | 0.06 tolerance_ent | | 1e-05 topo_file | | topog.nc use_cfc_cap | | FALSE use_contemp_abssal | | FALSE use_gm_work_bug | | FALSE use_land_mask_for_hvisc | | TRUE use_psurf_in_eos | | TRUE visc_res_scale_coef | | 0.4 z_init_file_salt_var | | salt z_init_remap_old_alg | | FALSE
MOM6-CICE6 1° configs are crashing after running for several weeks/months. Excessively large SSH appears in less than 1 day, without unusual wind stress - see https://github.com/COSIMA/MOM6-CICE6/pull/5#issuecomment-1665023553 https://github.com/COSIMA/MOM6-CICE6/pull/5#issuecomment-1676484356