E3SM-Project / Omega

Next generation ocean model within E3SM
https://docs.e3sm.org/Omega/omega
Other
4 stars 5 forks source link

Unexpected, unrelated error message when `TimeStepper` is not valid #137

Closed xylar closed 2 weeks ago

xylar commented 2 weeks ago

When I run Omega in debug mode (not sure about release mode) with an omega.yml from Polaris, I'm seeing an error like this:

[chr-0181:4116621:0:4116621] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x81cb0f)
==== backtrace (tid:4116632) ====
 0 0x0000000000012cf0 __funlockfile()  :0
 1 0x0000000000429424 OMEGA::TimeFrac::operator=()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/infra/TimeMgr.cpp:984
 2 0x000000000043b6f7 OMEGA::TimeInterval::operator=()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/infra/TimeMgr.h:418
 3 0x00000000004b37af OMEGA::TimeStepper::setTimeStep()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/timeStepping/TimeStepper.cpp:155
 4 0x00000000004b2f27 OMEGA::TimeStepper::init()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/timeStepping/TimeStepper.cpp:113
 5 0x000000000045e67e OMEGA::initOmegaModules()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/ocn/OceanInit.cpp:214
 6 0x000000000045c4b0 OMEGA::ocnInit()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/ocn/OceanInit.cpp:62
 7 0x00000000004151a0 main()  /home/ac.xylar/e3sm_work/omega/develop/components/omega/src/drivers/standalone/OceanDriver.cpp:30
 8 0x000000000003ad85 __libc_start_main()  ???:0
 9 0x000000000041496e _start()  ???:0

The omega.yml looks like:

Omega:
  TimeManagement:
    StartTime: 0001-01-01_00:00:00
    StopTime: none
    RunDuration: 0000_10:00:00.000
    CalendarType: No Leap
  TimeIntegration:
    TimeStepper: RK4
    TimeStep: 0000_00:10:00.000
  Dimension:
    NVertLevels: 1
  Decomp:
    HaloWidth: 3
    DecompMethod: MetisKWay
  State:
    NTimeLevels: 2
  Advection:
    FluxThicknessType: Center
  Tendencies:
    ThicknessFluxTendencyEnable: true
    PVTendencyEnable: true
    KETendencyEnable: true
    SSHTendencyEnable: true
    VelDiffTendencyEnable: false
    ViscDel2: 1.0e3
    VelHyperDiffTendencyEnable: false
    ViscDel4: 1.2e11

and I presume the issue might be the decimal in RunDuration and/or TimeStep. I didn't immediately see what changes would be needed to correctly support fractional sections here.

xylar commented 2 weeks ago

@philipwjones, can you assign this to whoever you think is the right person if it's not you? Thanks!

hyungyukang commented 2 weeks ago

As discussed on Slack, @xylar, the TimeStepper should be one of the following: Forward-Backward, RungeKutta4, RungeKutta2. In the latest version, I encountered the same error when using RK4. However, I still need to investigate why my previous tests with Polaris worked with RK4...

xylar commented 2 weeks ago

I can confirm that switching to RungeKutta4 fixed the problem for me. It seems like we need error checking that the TimeStepper is valid. Is that missing?

xylar commented 2 weeks ago

@mwarusz, I think you might be the right person to fix this (or to assign it to someone else). The switch statement here: https://github.com/E3SM-Project/Omega/blob/2b346b258821b12208500e373e9b77346b420be1/components/omega/src/timeStepping/TimeStepper.cpp#L20-L27 needs an else clause that should log a critical error.

xylar commented 2 weeks ago

Sorry @mwarusz, that's not your code. It's @brian-oneill's but I see how to fix it. I'll just assign it to myself and submit a fix.