NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
187 stars 140 forks source link

bug: dart doesn't catch 0 size model advance windows (hangs for subroutine callable models) #535

Closed nancycollins closed 1 week ago

nancycollins commented 1 year ago

:bug:

Describe the bug

  1. compile L96
  2. set the input.nml namelist items model_nml : time_step_days and time_step_seconds to 0
  3. run pmo or filter
  4. wait forever

for larger models with real data times (not 0,0) if you set the window size to 0 you'll get a window that starts at T+1 second and ends at T, which can't contain any of the obs.

Error Message

L96 loops forever. WRF prints:

PE 0: PE 0: move_ahead Next available observation at: day= 154166 sec=0 PE 0: move_ahead Next assimilation window starts at: day= 154166 sec=1 PE 0: move_ahead Current model data time is: day= 154166 sec=0 PE 0:
ERROR FROM: source : obs_model_mod.f90 routine: move_ahead message: Inconsistent model state/observation times, cannot continue message: ... If this is the start of the obs_seq file, message: ... can use filter namelist to set first obs or initial data time.

Which model(s) are you working with?

any

Version of DART

the current main branch

Have you modified the DART code?

not yet, but i think the fix is to add a test in assim_model_mod.f90 : get_closest_state_time_to() and error out if time_step is <= 0.

Build information

any machine, any compiler.

hkershaw-brown commented 1 month ago

https://github.com/NCAR/DART/blob/c8ddfac035a39bbb24451ced3b0aa5410ac62494/assimilation_code/modules/assimilation/obs_model_mod.f90#L182-L189

hkershaw-brown commented 4 weeks ago

the hang is in the do while loop

https://github.com/NCAR/DART/blob/75cf8dc9c566221f624ffd4d5eeba9fde5a1757c/assimilation_code/modules/assimilation/assim_model_mod.f90#L106-L108

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x0000000193bbbba0 libsystem_platform.dylib`__bzero
    frame #1: 0x00000001939f286c libsystem_malloc.dylib`nanov2_malloc_zero_on_alloc + 548
    frame #2: 0x00000001019ca4d8 libgfortran.5.dylib`___lldb_unnamed_symbol1692 + 20
    frame #3: 0x0000000101a895b4 libgfortran.5.dylib`___lldb_unnamed_symbol2126 + 56
    frame #4: 0x0000000101a834b0 libgfortran.5.dylib`___lldb_unnamed_symbol2013 + 128
    frame #5: 0x0000000101a81af4 libgfortran.5.dylib`___lldb_unnamed_symbol1992 + 88
    frame #6: 0x0000000101089dfc filter`__time_manager_mod_MOD_set_time at time_manager_mod.f90:177:73
    frame #7: 0x0000000101089a08 filter`__time_manager_mod_MOD_increment_time at time_manager_mod.f90:269:70
    frame #8: 0x000000010108903c filter`__time_manager_mod_MOD_time_plus at time_manager_mod.f90:431:60
    frame #9: 0x0000000100f36a38 filter`__assim_model_mod_MOD_get_closest_state_time_to at assim_model_mod.f90:107:68
    frame #10: 0x0000000101037800 filter`__obs_model_mod_MOD_move_ahead at obs_model_mod.f90:180:54
    frame #11: 0x0000000100fa3150 filter`__filter_mod_MOD_filter_main at filter_mod.f90:1081:57
    frame #12: 0x0000000101089f50 filter`MAIN__ at filter.f90:20:18
    frame #13: 0x0000000101089fa0 filter`main at filter.f90:11:55
    frame #14: 0x0000000193863e50 dyld`start + 2544