MPAS-Dev / compass

Configuration Of MPAS Setups
Other
10 stars 36 forks source link

Ocean test cases failing in CVMix with Gnu and debug #513

Open xylar opened 1 year ago

xylar commented 1 year ago

Both the ocean/isomip_plus/5km/z-star/Ocean0 and the new ocean/overflow/10km/default in #501 are failing on Chrysalis with Gnu in debug mode with a stack trace like:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x15554f15a3ff in ???
#1  0x15554fdd8010 in ???
#2  0xf955ca in __cvmix_convection_MOD_cvmix_coeffs_conv_wrap
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/cvmix/src/shared/cvmix_convection.F90:227
#3  0xf0bc32 in __ocn_vmix_cvmix_MOD_ocn_vmix_coefs_cvmix_build
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/shared/mpas_ocn_vmix_cvmix.F:739
#4  0xcb8dc7 in __ocn_vmix_MOD_ocn_vmix_coefs
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/shared/mpas_ocn_vmix.F:200
#5  0xcadb88 in __ocn_vmix_MOD_ocn_vmix_implicit
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/shared/mpas_ocn_vmix.F:1154
#6  0xb4d48b in __ocn_time_integration_split_MOD_ocn_time_integrator_split
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration_split.F:2424
#7  0xb46735 in __ocn_time_integration_MOD_ocn_timestep
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/mode_forward/mpas_ocn_time_integration.F:125
#8  0xb435d5 in __ocn_forward_mode_MOD_ocn_forward_mode_run
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/mode_forward/mpas_ocn_forward_mode.F:728
#9  0xb425e5 in __ocn_core_MOD_ocn_core_run
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-ocean/src/driver/mpas_ocn_core.F:111
#10  0x40f39a in __mpas_subdriver_MOD_mpas_run
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-framework/src/driver/mpas_subdriver.F:358
#11  0x40e113 in mpas
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-framework/src/driver/mpas.F:20
#12  0x40e17e in main
    at /home/ac.xylar/mpas-work/compass/main/e3sm_chrys_gnu_debug/components/mpas-framework/src/driver/mpas.F:10
xylar commented 1 year ago

To my surprise, I have used the bisect utility to trace this back to: https://github.com/E3SM-Project/E3SM/pull/5047

I will look into it further but may need another set of eyes from @sbrus89.

xylar commented 1 year ago

@sbrus89, I have given https://github.com/E3SM-Project/E3SM/pull/5047/files#diff-fe454e719efd5b684941f8249d696988c408e5720b7c5948a14be02a84adfc7a a pretty thorough look and don't see anything obvious.

Can you see if you can reproduce the problem using Gnu in debug mode on Chrysalis? Since you made these changes, I'm hoping you'll have a better idea of which field in cvmix_variables might have gotten corrupted, might not be initialized, etc. after the changes.

sbrus89 commented 1 year ago

@xylar, thanks for tracking this down. I'll take a look and see if I can figure anything out.