NOAA-OWP / noah-owp-modular

Modularized version of the NOAH-MP land surface model.
Other
8 stars 19 forks source link

Replace `stop` with `return` and provide and `bmi_failure` flags for EnergyModule error #108

Open SnowHydrology opened 4 months ago

SnowHydrology commented 4 months ago

This PR replaces the stop statement with return in EnergyModule.f90. This will allow the NextGen framework to provide basin and timestep information when BMI_FAILURE is reported.

Changes:

drakest123 commented 4 months ago

I agree with @PhilMiller that we need to immediately propagate an error up the call stack to avoid side effects of a late return from an error. I don’t see how to avoid conditional statements that check for errors before the end of a subroutine but most of the stop statements come from initialization checks at the beginning of a model run. In that case, conditional statements would have minimal impact on execution speed so the remaining consideration is with cluttering the code with conditional statements. Based on the example in PR #108, I’ve prototyped dealing with upstream error propagation. Logging functionality is included (in log_error() subroutine) such that write statements are contained within a single subroutine. log_error() would replace handle_err() and sys_abort() subroutines. Consistent with NGEN, existing preprocessor directives would control program flow (in driver/NoahModularDriver.f90) and logging status for non-fatal messages (in log_error() . Subroutines that include an error check would need to use ErrorCheckModule. ErrorCheckModule contains three public parameters:

  integer, parameter, public :: NOM_SUCCESS = 0
  integer, parameter, public :: NOM_FAILURE = 1
  integer, parameter, public :: NOM_MESSAGE = 2 

The values of NOM_SUCCESS and NOM_FAILURE are consistent with BMI_SUCCESS and BMI_FAILURE. Impacted files are:

% git status
On branch error_handling
Your branch is up to date with 'origin/error_handling'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   ../bmi/bmi_noahowp.f90
    modified:   ../driver/NoahModularDriver.f90
    modified:   EnergyModule.f90
    modified:   ErrorCheckModule.f90
    modified:   RunModule.f90

File additions/changes are:

% git diff
diff --git a/bmi/bmi_noahowp.f90 b/bmi/bmi_noahowp.f90
index d26aaef..3178acc 100644
--- a/bmi/bmi_noahowp.f90
+++ b/bmi/bmi_noahowp.f90
@@ -267,14 +267,14 @@ contains
   function noahowp_update(this) result (bmi_status)
     class (bmi_noahowp), intent(inout) :: this
     integer :: bmi_status
+    bmi_status = BMI_SUCCESS

     call advance_in_time(this%model)
-    if (this%model%domain%error_flag /= 0) then
+    if (this%model%domain%error_flag == BMI_FAILURE) then
       bmi_status = BMI_FAILURE
       return
-    else
-      bmi_status = BMI_SUCCESS
     end if
+
   end function noahowp_update

   ! Advance the model until the given time.
diff --git a/driver/NoahModularDriver.f90 b/driver/NoahModularDriver.f90
index a86ef93..f799713 100644
--- a/driver/NoahModularDriver.f90
+++ b/driver/NoahModularDriver.f90
@@ -43,6 +43,14 @@ program model_driver
   print*, "Running..."
   do while (current_time < end_time)
     status = m%update()                       ! run the model one time step
+    if (status == BMI_FAILURE) then
+#ifdef NGEN_ACTIVE
+      return status                           ! if NGEN
+#else
+      print*, "Stopping program."
+      stop
+#endif
+    end if
     status = m%get_current_time(current_time) ! update current_time
   end do

diff --git a/src/EnergyModule.f90 b/src/EnergyModule.f90
index 22c7577..4d0f44e 100644
--- a/src/EnergyModule.f90
+++ b/src/EnergyModule.f90
@@ -2,6 +2,7 @@ module EnergyModule

   use LevelsType
   use DomainType
+  use ErrorCheckModule
   use OptionsType
   use ParametersType
   use WaterType
@@ -45,6 +46,7 @@ contains
     REAL                                 :: D_RSURF  ! Reduced vapor diffusivity in soil for computing RSURF (SZ09)

     REAL                                 :: FIRE   !emitted IR (w/m2)
+    CHARACTER(len=256)                   :: error_string
     !---------------------------------------------------------------------

     ! Initialize the the fluxes from the vegetated fraction
@@ -299,13 +301,20 @@ contains
     END IF

     FIRE = forcing%LWDN + energy%FIRA
+    FIRE = -1     ! TESTING ERROR HANDLING
     IF(FIRE <=0.) THEN
-      domain%error_flag = 1
-      WRITE(*,*) 'emitted longwave <0; skin T may be wrong due to inconsistent'
-      WRITE(*,*) 'input of SHDFAC with LAI'
-      WRITE(*,*) domain%ILOC, domain%JLOC, 'SHDFAC=',parameters%FVEG,'parameters%VAI=',parameters%VAI,'TV=',energy%TV,'TG=',energy%TG
-      WRITE(*,*) 'LWDN=',forcing%LWDN,'energy%FIRA=',energy%FIRA,'water%SNOWH=',water%SNOWH
-      WRITE(*,*) 'Exiting ...'
+      domain%error_flag = NOM_FAILURE
+      !WRITE(*,*) 'emitted longwave <0; skin T may be wrong due to inconsistent'
+      !WRITE(*,*) 'input of SHDFAC with LAI'
+      !WRITE(*,*) domain%ILOC, domain%JLOC, 'SHDFAC=',parameters%FVEG,'parameters%VAI=',parameters%VAI,'TV=',energy%TV,'TG=',energy%TG
+      !WRITE(*,*) 'LWDN=',forcing%LWDN,'energy%FIRA=',energy%FIRA,'water%SNOWH=',water%SNOWH
+      !WRITE(*,*) 'Exiting ...'
+      write(error_string,101) 'EnergyMain: emitted longwave <0; skin T &
+       may be wrong due to inconsistent input of SHDFAC with LAI. ILOC=', &
+       domain%ILOC, ', JLOC=',domain%JLOC, ', SHDFAC=',parameters%FVEG,', &
+       parameters%VAI=',parameters%VAI,', TV=',energy%TV,', TG=',energy%TG
+101   format(A,I10,A,I10,A,F8.3,A,F8.3,A,F8.3,A,F8.3)
+      call log_message(domain%error_flag, trim(error_string))
       RETURN
     END IF

diff --git a/src/ErrorCheckModule.f90 b/src/ErrorCheckModule.f90
index f330696..33ef516 100644
--- a/src/ErrorCheckModule.f90
+++ b/src/ErrorCheckModule.f90
@@ -1,12 +1,15 @@
 module ErrorCheckModule

-  ! General error checking routins
-
+  ! General error checking routines
   implicit none
+  integer, parameter, public :: NOM_SUCCESS = 0
+  integer, parameter, public :: NOM_FAILURE = 1
+  integer, parameter, public :: NOM_MESSAGE = 2

   private
   public:: sys_abort
   public:: is_within_bound
+  public:: log_message

   interface is_within_bound
     module procedure is_within_bound_int
@@ -32,6 +35,29 @@ contains

   end subroutine sys_abort

+  subroutine log_message(err, message)
+
+    ! log information, typically an error
+
+    implicit none
+
+    integer, intent(in) :: err                  ! error code
+    character(*), intent(in) :: message         ! message
+
+    ! If error, write the error. If message, write message unless NGEN_QUIET
+    if(err==NOM_FAILURE)then
+      write(*, '(A,I2,A)') ' Error Code: ', err, ',  Message: '//trim(message)
+      call flush(6)
+    endif
+#ifndef NGEN_QUIET
+    if(err==NOM_MESSAGE)then
+      write(*, '(A,I2,A)') ' Error Code: ', err, ',  Message: '//trim(message)
+      call flush(6)
+    endif
+#endif
+
+  end subroutine log_message
+
   function is_within_bound_int(var, lower_bound, upper_bound) result(withinbound)

     ! check if a integer value is within specified bounds
diff --git a/src/RunModule.f90 b/src/RunModule.f90
index cb357b8..fdcb20c 100644
--- a/src/RunModule.f90
+++ b/src/RunModule.f90
@@ -18,8 +18,10 @@ module RunModule
   use EnergyModule
   use WaterModule
   use DateTimeUtilsModule
+  use ErrorCheckModule

   implicit none
+
   type :: noahowp_type
     type(namelist_type)   :: namelist
     type(levels_type)     :: levels
@@ -246,6 +248,9 @@ contains
     type (noahowp_type), intent (inout) :: model

     call solve_noahowp(model)
+    if (model%domain%error_flag == NOM_FAILURE) then
+      return
+    end if

     model%domain%itime    = model%domain%itime + 1 ! increment the integer time by 1
     model%domain%time_dbl = dble(model%domain%time_dbl + model%domain%dt) ! increment model time in seconds by DT
@@ -308,6 +313,9 @@ contains
     !---------------------------------------------------------------------

     call EnergyMain (domain, levels, options, parameters, forcing, energy, water)
+    if (domain%error_flag == NOM_FAILURE) then
+      return
+    end if

     !---------------------------------------------------------------------
     ! call the main water routines (canopy + snow + soil water components)
GreyREvenson commented 3 months ago

Offering my thoughts for consideration, @SnowHydrology

First, on the overall approach, the use of an error flag variable seems appropriate (e.g., domain%error_flag). I failed to find a superior solution and am not confident that one will be found (see Does exception handling exist in Fortran?). Unfortunately, it does seem like we need to add flag checks (e.g., if (domain%error_flag .eq. NOM_FAILURE) return -- the one line syntax could help decrease clutter) immediately after any call to a subroutine within which the error flag could be set. This is a lot of clutter, as @drakest123 noted, but would be necessary to satisfy @PhilMiller's stipulation that the early return be immediately propagated up the stack. I don't know how else to do it.

Second, regarding the error message, I'd advocate for saving the error message and printing when NGEN would print the time step and location information for the error (if possible). Instead, this code (to my current understanding) calls log_message (in ErrorCheckModule) to print the error message when the error is encountered.

Third, @drakest123, I'm not sure if you meant to add your commits atop those from @SnowHydrology in his error_handling branch but I'm noting that here as it caused some initial confusion on my part.

drakest123 commented 3 months ago

Updated error handling, addressing some of Grey Evenson's suggestions. Ready for review by @SnowHydrology. Changes include: