NOAA-CEFI-Regional-Ocean-Modeling / ocean_BGC

3 stars 4 forks source link

Getting an error from the equation of state, called from COBALT #29

Closed kshedstrom closed 2 months ago

kshedstrom commented 3 months ago

Yesterday's dev/cefi gave me this stack trace:

fms_MOM6_SIS2_COB 00000000009560C2 mom_eos_mp_calcul     228 MOM_EOS.F90
fms_MOM6_SIS2_COB 0000000001727663 generic_cobalt_mp    8320 generic_COBALT.F90
fms_MOM6_SIS2_COB 0000000001439637 generic_tracer_mp     563 generic_tracer.F90
fms_MOM6_SIS2_COB 0000000001351345 mom_generic_trace     582 MOM_generic_tracer.F90
fms_MOM6_SIS2_COB 0000000000FD1A8F mom_tracer_flow_c     529 MOM_tracer_flow_control.F90
fms_MOM6_SIS2_COB 0000000000A2F3E3 mom_diabatic_driv    1607 MOM_diabatic_driver.F90
fms_MOM6_SIS2_COB 0000000000A24105 mom_diabatic_driv     404 MOM_diabatic_driver.F90
fms_MOM6_SIS2_COB 00000000009D2020 mom_mp_step_mom_t    1602 MOM.F90
fms_MOM6_SIS2_COB 00000000009C78FB mom_mp_step_mom_     851 MOM.F90
fms_MOM6_SIS2_COB 00000000009751C2 ocean_model_mod_m     594 ocean_model_MOM.F90
fms_MOM6_SIS2_COB 0000000000413DFF MAIN__          1063 coupler_main.F90

I recompiled with ec631ef65375de78c35a1891e67502b49b230507 and it runs past this (and runs into the imbalance issues).

andrew-c-ross commented 3 months ago

The depth index for the temperature and salinity used to calculate density, kmld_ref, is a float rather than an int. I think this was also present in ec631ef, so maybe it's not the problem, but just wanted to point that out.

https://github.com/NOAA-CEFI-Regional-Ocean-Modeling/ocean_BGC/blob/6e6ed114a711f060da78aece6170a4369b2ee32c/generic_tracers/generic_COBALT.F90#L7915

https://github.com/NOAA-CEFI-Regional-Ocean-Modeling/ocean_BGC/blob/6e6ed114a711f060da78aece6170a4369b2ee32c/generic_tracers/generic_COBALT.F90#L8320

yichengt900 commented 3 months ago

@kshedstrom, thank you for reporting, and @andrew-c-ross, thank you for your prompt check and response. I agree that it's probably better to change kmld_ref from real to int to avoid confusion. However, as @andrew-c-ross pointed out, it exists in both tags, so there may be something else causing the issue. I just did a quick test on Gaea using 1D, NWA, and NEP cases but could not reproduce this error.

@kshedstrom, would you mind providing more details? Did this error occur on Gaea or your HPC system? Also, what kind of compiler did you use in your case? Thank you.

kshedstrom commented 3 months ago

@yichengt900 This is on gaea, the script: /ncrc/home2/Katherine.Hedstrom/ESMG/ESMG-configs/Arctic6/BGC/run62a.xml The detritus is around still too, under /gpfs/f5/cefi/scratch/Katherine.Hedstrom/work

charliestock commented 3 months ago

This is mysterious. Are we sure this wasn't a system issue that the EOS happened to catch in COBALT? If you run the same simulation again, does it crash in the same place?

Nice catch on k_mld Andrew. It would be great if did not have to calculate densities or potential densities at all. This has been suggested with the MOM6 team, but there is some reluctance that I'm not sure I fully understand....

andrew-c-ross commented 3 months ago

If I'm looking at the right run, the traceback includes

Image              PC                Routine            Line        Source             
fms_MOM6_SIS2_COB  00000000022142DB  Unknown               Unknown  Unknown
libpthread-2.31.s  00001515B8E89910  Unknown               Unknown  Unknown
libpthread-2.31.s  00001515B8E887D3  close                 Unknown  Unknown
libdarshan.so.0.0  00001515B84223F3  close                 Unknown  Unknown
libhdf5_intel.so.  00001515B882AC3B  Unknown               Unknown  Unknown
libhdf5_intel.so.  00001515B8818692  H5FD_close            Unknown  Unknown
libhdf5_intel.so.  00001515B88009F3  Unknown               Unknown  Unknown
libhdf5_intel.so.  00001515B8801B04  H5F_try_close         Unknown  Unknown
libhdf5_intel.so.  00001515B8801779  H5F__close            Unknown  Unknown
libhdf5_intel.so.  00001515B8A68976  H5VL__native_file     Unknown  Unknown
libhdf5_intel.so.  00001515B8A51CD8  H5VL_file_close       Unknown  Unknown
libhdf5_intel.so.  00001515B87FB574  Unknown               Unknown  Unknown
libhdf5_intel.so.  00001515B889107C  H5I_clear_type        Unknown  Unknown
libhdf5_intel.so.  00001515B87FB50E  H5F_term_package      Unknown  Unknown
libhdf5_intel.so.  00001515B8727905  H5_term_library       Unknown  Unknown
libc-2.31.so       00001515B4456B79  Unknown               Unknown  Unknown
libc-2.31.so       00001515B4456D0A  Unknown               Unknown  Unknown
libinfinipath.so.  00001515AEE04602  Unknown               Unknown  Unknown
libpthread-2.31.s  00001515B8E89910  Unknown               Unknown  Unknown

This sounds a lot like some of the filesystem problems we've been having.

yichengt900 commented 3 months ago

@kshedstrom, thank you for sharing the XML. I noticed that for the MOM6 version, you were pointing to ESMG/MOM6 (which is up to date with dev/gfdl).

It is suitable for the physics-only run, which includes the OBC fixes, but it will not work with COBALTv3.0 due to the lack of a few minor tracer interface changes (we simplified the changes back to January). In order to make MOM6 work with COBALTv3, unfortunately you will need to include these changes in MOM6 generic_tracer.F90.

You can either create a new branch with these changes in ESMG, or use dev/cefi MOM6 , which is almost identical to dev/gfdl and dev/esmg except for the interface changes.

kshedstrom commented 3 months ago

I had three separate runs which all failed in the call to EOS.

I had a version with some interface changes, but I thought you had backed them out. How did this compile?

yichengt900 commented 3 months ago

The new COBALTv3 requires geolat (for day length calculation) and eqn_of_state (for mixed layter density calculation), which need to be passed from MOM6 during generic_tracer_source call.

I checked the source codes you used under /gpfs/f5/cefi/scratch/Katherine.Hedstrom/fre/Arc_12:

  1. MOM6_SIS2_COBALT_compile_symm/src/mom6/src/MOM6/src/tracer/MOM_generic_tracer.F90: no interface change
  2. MOM6_SIS2_COBALT_compile_symm/src.20240403.134121/mom6/src/MOM6/src/tracer/MOM_generic_tracer.F90: no interface change
  3. MOM6_SIS2_COBALT_compile_symm_a/src/mom6/src/MOM6/src/tracer/MOM_generic_tracer.F90: no interface change
  4. MOM6_SIS2_compile_ciod/src/mom6/src/MOM6/src/tracer/MOM_generic_tracer.F90: old interface changes before #7

I think the reason why you can still compile the code is that we set geolat and eqn_of_state as optional inputs. This allows the generic_tracer_source to still work with COBALTv2.

kshedstrom commented 3 months ago

So do I need to be updating to pass the EOS as an optional argument? Is that what's failing? Should it fail in a more informative way if it doesn't get the EOS?

yichengt900 commented 3 months ago

Yes, you will need them if you are using COBALT from the latest dev/cefi ocean_BGC branch. @charliestock and @theresa-morrison , just keeping both you in the loop. If we decide to keep the density calculation within COBALTv3, we could consider adding a warning/fatal error as @kshedstrom suggested.

yichengt900 commented 2 months ago

This issue has been address by PR#33. Will close it for now.