NOAA-GFDL / CEFI-regional-MOM6

A repository containing essential tools, XML files, and source codes for collaborators of the Climate, Ecosystems, and Fisheries Initiative (CEFI) to conduct simulations.
Other
17 stars 15 forks source link

Runtime error in Arctic #103

Open kshedstrom opened 1 week ago

kshedstrom commented 1 week ago

The standard output is here: /gpfs/f6/ira-cefi/scratch/Katherine.Hedstrom/fre/Arc_12//run93/ncrc6.intel23-prod-hdf5/stdout/run/run93.o207213298

Trouble starts at line 4034. I've got exactly the same thing running now, the only difference being that I have the esmg_work branch of MOM6 instead of the default from a cefi build. This is supposed to be one in a series of Icepack tests.

yichengt900 commented 1 week ago

Thanks, @kshedstrom. Unfortunately, I’m unable to access the stdout you posted due to permission issues. Ok now I can see your stdout, interesting error. I did test your XML with the dev/cefi MOM6 (XML available here: /gpfs/f6/ira-cefi/world-shared/upload/run93.xml) and successfully ran a one-month regression test (output available here: /archive/ynt/fre/Arc_12/run93/run93/gfdl.ncrc6-intel23-prod-hdf5/1x1m0d_1648x1o).

Since this is a physics-only run, the dev/cefi MOM6 should behave the same as dev/gfdl MOM6. It looks like the esmg MOM6 is 33 commits behind the dev/gfdl MOM6. I’m not sure if any of these commits are relevant to the errors you encountered. I’m not sure if the issue you encountered is machine-related. CC: @theresa-morrison.

kshedstrom commented 1 week ago

Sorry about the permissions issues. The trouble looks like:

WARNING from PE 1063: Bad ice state enth_ice End of ice_state_cleanup ; at

43.8 66.4 or i,j,k = 5 15 2; nbad = 1 on pe 1063 ; part_size = 6.9389E-18

WARNING from PE 1063: mi/ms = 2.2625E+02 0.0000E+00 ts = 0.0000E+00 ti = -1.2317E+02, -1.0487E+00, -1.3254E+00, -1.3891E+00

WARNING from PE 1063: enth_snow = 0.0000E+00 enth_ice = -5.9323E+05, -2.9661E+05, -2.9661E+05, -2.9661E+05

WARNING from PE 1063: salin_ice = 2.0000E+00, 2.0000E+00, 4.0000E+00, 2.0000E+00

A slightly different run with the same cefi source code died like this:

WARNING from PE 53: btstep: eta has dropped below bathyT: -1.1715110196008430E+01

vs. -1.0916015625000000E+01 at -1.7915E+02 5.1221E+01 343 27

I don't know what's going on. Both cases ran using esmg_work MOM6.

Kate

On Tue, Oct 22, 2024 at 10:20 AM Yi-Cheng Teng - NOAA GFDL < @.***> wrote:

Thanks, @kshedstrom https://github.com/kshedstrom. Unfortunately, I’m unable to access the stdout you posted due to permission issues. I did test your XML with the dev/cefi MOM6 (XML available here: /gpfs/f6/ira-cefi/world-shared/upload/run93.xml) and successfully ran a one-month regression test (output available here: /archive/ynt/fre/Arc_12/run93/run93/gfdl.ncrc6-intel23-prod-hdf5/1x1m0d_1648x1o ).

Since this is a physics-only run, the dev/cefi MOM6 should behave the same as dev/gfdl MOM6. I’m not sure if the issue you encountered is machine-related. CC: @theresa-morrison https://github.com/theresa-morrison.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GFDL/CEFI-regional-MOM6/issues/103#issuecomment-2429951848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMQHRI74ZVK52WNNZHMY3Z42JNPAVCNFSM6AAAAABQLDITJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZHE2TCOBUHA . You are receiving this because you were mentioned.Message ID: @.***>

theresa-morrison commented 1 week ago

Kate, can you point me to the esmg_work MOM6 executable? Or a standard out from that experiment?

On Tue, Oct 22, 2024 at 2:58 PM Kate Hedstrom @.***> wrote:

Sorry about the permissions issues. The trouble looks like:

WARNING from PE 1063: Bad ice state enth_ice End of ice_state_cleanup ; at

43.8 66.4 or i,j,k = 5 15 2; nbad = 1 on pe 1063 ; part_size = 6.9389E-18

WARNING from PE 1063: mi/ms = 2.2625E+02 0.0000E+00 ts = 0.0000E+00 ti = -1.2317E+02, -1.0487E+00, -1.3254E+00, -1.3891E+00

WARNING from PE 1063: enth_snow = 0.0000E+00 enth_ice = -5.9323E+05, -2.9661E+05, -2.9661E+05, -2.9661E+05

WARNING from PE 1063: salin_ice = 2.0000E+00, 2.0000E+00, 4.0000E+00, 2.0000E+00

A slightly different run with the same cefi source code died like this:

WARNING from PE 53: btstep: eta has dropped below bathyT: -1.1715110196008430E+01

vs. -1.0916015625000000E+01 at -1.7915E+02 5.1221E+01 343 27

I don't know what's going on. Both cases ran using esmg_work MOM6.

Kate

On Tue, Oct 22, 2024 at 10:20 AM Yi-Cheng Teng - NOAA GFDL < @.***> wrote:

Thanks, @kshedstrom https://github.com/kshedstrom. Unfortunately, I’m unable to access the stdout you posted due to permission issues. I did test your XML with the dev/cefi MOM6 (XML available here: /gpfs/f6/ira-cefi/world-shared/upload/run93.xml) and successfully ran a one-month regression test (output available here:

/archive/ynt/fre/Arc_12/run93/run93/gfdl.ncrc6-intel23-prod-hdf5/1x1m0d_1648x1o

).

Since this is a physics-only run, the dev/cefi MOM6 should behave the same as dev/gfdl MOM6. I’m not sure if the issue you encountered is machine-related. CC: @theresa-morrison https://github.com/theresa-morrison.

— Reply to this email directly, view it on GitHub < https://github.com/NOAA-GFDL/CEFI-regional-MOM6/issues/103#issuecomment-2429951848>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AADMQHRI74ZVK52WNNZHMY3Z42JNPAVCNFSM6AAAAABQLDITJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZHE2TCOBUHA>

. You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/NOAA-GFDL/CEFI-regional-MOM6/issues/103#issuecomment-2430020819, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3HFAJP6BU2M6PY2XLSE3ODZ42N4BAVCNFSM6AAAAABQLDITJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQGAZDAOBRHE . You are receiving this because you were mentioned.Message ID: @.***>

kshedstrom commented 1 week ago

The outputs are in: /archive/Katherine.Hedstrom/fre/Arc_12/run93/gfdl.ncrc6-intel23-prod-hdf5/ascii

Executables and sources are under: /gpfs/f6/ira-cefi/scratch/Katherine.Hedstrom/fre/Arc_12/MOM6_SIS2_compile_icetest