fortran-lang / stdlib

Fortran Standard Library
https://stdlib.fortran-lang.org
MIT License
1.06k stars 164 forks source link

Handling runtime errors/exceptions #212

Open nshaffer opened 4 years ago

nshaffer commented 4 years ago

This issue is to discuss approaches to handle runtime errors and exceptions in stdlib. I have in mind scenarios such as

These are conditions which cannot be identified at compile time, but if left alone will either cause user code to crash or be erroneous. Currently in stdlib, we have check, which can print error messages and optionally terminate execution. I think it is not sufficient for general-purpose runtime checking. To me, there are a few major considerations when thinking about runtime checking

  1. Users must be given the opportunity to recover from the error if at all possible. This is especially important for library code, which may be difficult to debug depending on the installation/distribution. It is also generally rude for library code to kill execution without giving the user any say in the matter.

  2. It should be possible to recover from runtime errors without sacrificing purity. If a routine is manifestly pure, then it should not have to sacrifice the pure attribute just to have some error checking. If I write a factorial function, making it pure and handling negative arguments should not have to be an either/or proposition.

  3. Checking and handling errors should not be unduly burdensome to users. If a function call requires one line, and handling its possible error conditions requires ten, users will simply not bother with error checking.

This list of criteria is not exhaustive, but they are the three that are most important to me. That stated, here are the runtime checking approaches I am most familiar with and how they stack up:

  1. Return error code and message as optional out-params.

    • Does not play nicely with pure functions (pure subroutines OK, though)
    • Intrinsic functions do not do this (but statements do, e.g., iostat, iomsg)
    • Passes the buck to the user to interpret and handle the error code/message
  2. Die with error stop

    • As of f2018, can be used in pure functions
    • Kind of rude for library code to kill execution without chance of recovery
  3. Return a special value

    • This is what intrinsics usually do, e.g., index returning -1
    • Some algorithms may not have a natural "obviously wrong" value (maybe NaN?)
    • Works with pure functions/subroutines
  4. Raise an exception

    • Maybe in 2045?

Of these, my preference is strongly toward approach 3 whenever possible.

epagone commented 4 years ago

Very nice description and, FWIW, I prefer approach 3 as well.

One problem that I have faced a few times in the design of such routines is how this API communicates when nested at multiple levels of depth. Say the user is running a numerical integrator within a root-finding problem that is, in turn, used to solve a system of a differential equations. We might have three levels of depth in this case (admittedly, probably rare and fictitious), but I have personally used (within the scope of stdlib) two levels of depth in several cases. I'd guess OO design would help in this sense (with separate instances of the "error handler class") but I know it is discouraged in stdlib.

certik commented 4 years ago

One thing that we should try to avoid is to force every user to be checking every single function from stdlib to ensure it succeeded. We should figure out how to allow users to check it if they want to and handle the error themselves, but we should also allow users to just call a given stdlib function and it would fail with a nice error message if something goes wrong.

So I would say a combination of 3. and 1. is the way to go.

The option 2. should be combined with the option 1. I.e., if the optional stat argument is not present, then it will fail with error stop. If stat is present, it would allow the user to handle the error. This is great for functions such as open or allocate.

The option 3. is great for functions such as find to find an element in an array or a string, and return -1 if it does not exist.

jvdp1 commented 4 years ago

Thank you for these clear propositions. It is related to #95 too.

The options 1., 2., and 3. could be used for different purposes/procedures. For example, returning -1 or even NaN for one of the statistical procedure when a dim > 15 is provided would make no sense, and would not help the user. In such a case, I think it is better that the program fails with error stop and an informative message.

wclodius2 commented 4 years ago

There are two related questions raised by @nshaffer original post:

  1. What infrastructure should the library provide for handling errors?
  2. What coding guidelines should the authors of the library follow to best handle errors?

To address the first question I have written a module, that I have tentatively named STDLIB_ERROR_REPORTING, and a markdown document describing the module's API in some detail. The module consists of a number of named constants and five subroutines.

The module defines 80+ unique named constants to serve as error codes for reporting errors in a consistent manner. The constants can be defined as integers or a derived type, but are currently defined as a derived type named ERRORS, whose sole public component is named CODE. The advantages and disadvantages of the different means of defining the constants are:

  1. Integers a. Advantage:

    * It is what the Fortran standard currently uses for reporting errors, and is what its users are used to,
    * It requires less typing to enter `INTEGER` than to enter `TYPE(NAME)`, and
    * They can easily be used in a `SELECT CASE` construct

    b. Disadvantage: the users may be tempted to use the processor's STATUS values as their flags.

    * the `STATUS` values the processor returns may duplicate values for the error codes, but with very different meanings
    * the meanings of the processor's `STATUS` values are not readily apparent
  2. Derived type with a public component:

    a. Advantage:

    * The name of the type can be intuitive
    * With a public component it can be used in `SELECT CASE` constructs
    * More difficult to confuse with `STATUS` values

    b. Disadvantage:

    * It is not what users are used to
    * `TYPE(NAME)` requires more typing than `INTEGER`
  3. Derived type with a private component

    a. Advantage

    * The name of the type can be intuitive
    * Very difficult to confuse with STATUS values

    b. Disadvantage

    * It is not what users are used to
    * Cannot be used in `SELECT CASE` constructs
    * `TYPE(NAME)` requires more typing than `INTEGER`

Four of the five subroutines have ERROR as an optional argument for passing error codes. For the fifth ERROR is mandatory. Three subroutines, have optional MODULE and PROCEDURE arguments for specifying the location where the error was discovered. One of the procedures, ASSERT, is similar in intent to CHECK in STDLIB_ERROR, while another, SEND_STOP, is similar in intent to ERROR_STOP in the same module. A summary of the subroutines is as follows:

ASSERT( TEST, MESSAGE [, MODULE, PROCEDURE, ERROR ] ): If TEST is .TRUE. does nothing, otherwise it writes text to the ERROR_UNIT, with an ERROR dependent string as the stop code.

REPORT_ERROR( MESSAGE [, MODULE, PROCEDURE, STAT, ERRMSG, ERROR ] ): Writes the character string, MESSAGE, to ERROR_UNIT, with an ERROR dependent string as the stop code.

REPORT_IO_ERROR( MESSAGE [, MODULE, PROCEDURE, IOSTAT, IOMSG, ERROR ] ): Writes the character string, MESSAGE, to ERROR_UNIT, with an ERROR dependent string as the stop code.

REPORT_TEXT_ERROR( LINE, START_INDEX, DESCRIP [, FILENAME, LINE_NUMBER, WRITE_UNIT, ERROR ] ): Sends a message to WRITE_UNIT describing an error found in a line of text.

SEND_STOP( ERROR ): Stops processing with an ERROR specific string as the stop code. Note this subroutine would be PURE in F2018, but is not PURE in F2008.

As to the second question as to what guidelines programmers should follow I have a few strong opinions on user and memory allocation errors and pure procedures. Programmers should not report user errors up the call chain. They should simply stop with an informative message, or if purity or elemental attributes are important and F2008 compatibility is important, they should do nothing at all. Programmers should almost never attempt to handle memory allocation problems. With virtual memory, allocation should almost never fail, and with lazy allocation on Linux, failure need not be discovered at the invocation of an ALLOCATE statement. They should only attempt to handle an allocation error if:

  1. A large amount of memory is likely to be allocated,
  2. An alternative less memory intensive algorithm is available, and
  3. Most memory is used near the ALLOCATE statement.

Finally as to purity, there is currently nothing to be done to handle errors in F2018 for ELEMENTAL procedures. For PURE procedures in 2018 you can invoke ERROR STOP, or pass a STATUS flag. For the nominal, elemental factorial function with a negative argument, I would return a NaN.

wclodius2 commented 4 years ago

FWIW here is some example code illustrating how I view the module being used:

MODULE EXAMPLE_MOD
  ...
  USE STDLIB_ERROR_REPORTING
  ...
  MODULE_NAME = 'EXAMPLE_MOD'
  ...
CONTAINS
  ...
  SUBROUTINE EXAMPLE_SUB(..., STATUS, ... )
    ...
    TYPE(ERRORS), INTENT(OUT), OPTIONAL :: STATUS
    ...
    SELECT CASE (FLAG)
    ...
    CASE DEFAULT
      IF ( PRESENT(STATUS) ) THEN
        STATUS = CONSISTENCY_FAULT
        RETURN
      ELSE
        CALL REPORT_ERROR('Illegal FLAG value.', &
          MODULE = MODULE_NAME,                  &
          PROCEDURE = 'EXAMPLE_SUB',             &
          ERROR = CONSISTENCY_FAULT )
      END IF
    END SELECT
    ...
    OPEN ( UNIT=LUN, FILE=FILENAME, STATUS='OLD', &
      ERR=99, IOSTAT=STAT, IOMSG=MESSAGE )
    ...
 99 IF (PRESENT(STATUS)) THEN
      STATUS = OPEN_FAULT
      RETURN
    ELSE
      CALL REPORT_IO_ERROR ( TRIM(FILENAME) // &
        " could not be opened as 'OLD'.",      &
        MODULE = MODULE_NAME,                  &
        PROCEDURE = 'EXAMPLE_SUB',             &
        IOSTAT = STAT,                         &
        IOMSG = MESSAGE,                       &
        ERROR = OPEN_FAULT )
    END IF
    ...
100 READ ( UNIT=LUN, END=190, EOR=192, ERR=194,&
           IOMSG=IOMSG, IOSTAT=STAT ) A
    ...
    IF ( PRESENT(STATUS) ) THEN
      STATUS = SUCCESS
    END IF
    RETURN
    ...
190 IF ( PRESENT(STATUS) ) THEN
      STATUS = EOF_FAULT
      RETURN
    END IF
    CALL REPORT_IO_ERROR("End-of-File occurred"//&
      "reading from file '"//TRIM(FILENAME)//"'.",&
      MODULE = MODULE_NAME,                       &
      PROCEDURE = 'EXAMPLE_SUB',                  &
      IOSTAT = STAT,                              &
      IOMSG = MESSAGE,                            &
      ERROR = EOF_FAULT )
192 IF ( PRESENT(STATUS) ) THEN
      STATUS = EOR_FAULT
      RETURN
    END IF
    CALL REPORT_IO_ERROR("End-of-Record occurred reading "//&
      "from file '"//TRIM(FILENAME)//"'.",     &
      MODULE = MODULE_NAME,                    &
      PROCEDURE = 'EXAMPLE_SUB',               &
      IOSTAT = STAT,                           &
      IOMSG = MESSAGE,                         &
      ERROR = EOR_FAULT )
194 IF ( PRESENT(STATUS) ) THEN
      STATUS = READ_FAULT
      RETURN
    END IF
    CALL REPORT_IO_ERROR("Read fault occurred reading "//&
      "from file '"//TRIM(FILENAME)//"'.",      &
      MODULE = MODULE_NAME,                     &
      PROCEDURE = 'EXAMPLE_SUB',                &
      IOSTAT = STAT,                            &
      IOMSG = MESSAGE,                          &
      ERROR = READ_FAULT )
  END SUBROUTINE EXAMPLE_SUB
  ...
END MODULE EXAMPLE_MOD
nshaffer commented 4 years ago

@wclodius2 Thank you for the carefully considered post. I hope you can clarify a few things for me about your approach.

  1. I do not fully understand your comments on processor-defined status values, e.g., for the stat= specifier in allocate statements or the iostat= specifier of I/O statements. At least in the case of iostat values, the named constants in iso_fortran_env seem quite suitable.

  2. In your example, is it intended that users only test for error codes that they expect to handle and let all others fall to the default case? What do they do if they think none of the predefined error codes makes sense for them?

  3. It is not clear to me what is gained from this approach compared to defining and testing for my own status codes. It provides pre-defined status codes and a few utility routines, but does this facilitate error handling in user code or just formalize it?

everythingfunctional commented 4 years ago

@wclodius2 , your example usage looks very similar to a library I developed: erloff

I don't have the routines that output the errors and stop the program, but those are easy to add.

wclodius2 commented 4 years ago

@nshaffer thank you for your kind words.

  1. Thank you for reminding me of the error flags in ISO_FORTRAN_ENV. It has given me a few more error codes to add to my initial list of eighty two. The one problem with the ISO_FORTRAN_ENV flags is that their values can vary arbitrarily between processors, so if you want to pass status flags for multiple categories of problems, ensuring that your values are different for each cause requires replacing their values with your own any way.

  2. It is expected that the writer of the routine that reports the errors to the user documents what errors it passes, that the user decides whether he wants to handle any of them or let failure occur in the called routine, and that the user decide which errors he wants to handle or let fail in the calling routine. I have tried to be very thorough in identifying categories of errors. If a user decides that none of the codes are suitable they can do any of: ask me to add an appropriate error code; if the code were added to the standard library he could add them himself, use the generic FAILURE code; or define their own (presumably integer) flags and document them.

  3. I think a standard library should formalize its error reporting so that users have a consistent interface. As a starting point, I think that having utility routines that report the location of the errors are useful. I want to contribute to the library, but don't want each module to have its own error reporting.

wclodius2 commented 4 years ago

@everythingfunctional I'm glad I am not the only one that approaches error handling this way.

arjenmarkus commented 4 years ago

Wrt the error codes defined in ISO_FORTRAN_ENV we could use code like this to ensure that the error codes in the library do not interfere with the predefined ones for any processor:

! errorcodes.f90 -- ! Use simple calculations to ensure a range outside all predefined error codes ! program errorcodes use iso_fortran_env

integer, parameter :: error_base = 1 + max(abs(iostat_end),

abs(iostat_eor), abs(iostat_inquire_internalunit) ) ! TODO: add the STAT* codes

write(*,*) error_base

end program errorcodes

Out of laziness I have left out the STAT_* error codes :). The program is accepted by both gfortran and Intel Fortran and produces two very different values, 5019 and 91, respectively.

Regards,

Arjen

Op zo 12 jul. 2020 om 23:32 schreef William B. Clodius < notifications@github.com>:

@nshaffer https://github.com/nshaffer thank you for your kind words.

1.

Thank you for reminding me of the error flags in ISO_FORTRAN_ENV. It has given me a few more error codes to add to my initial list of eighty two. The one problem with the ISO_FORTRAN_ENV flags is that their values can vary arbitrarily between processors, so if you want to pass status flags for multiple categories of problems, ensuring that your values are different for each cause requires replacing their values with your own any way. 2.

It is expected that the writer of the routine that reports the errors to the user documents what errors it passes, that the user decides whether he wants to handle any of them or let failure occur in the called routine, and that the user decide which errors he wants to handle or let fail in the calling routine. I have tried to be very thorough in identifying categories of errors. If a user decides that none of the codes are suitable they can do any of: ask me to add an appropriate error code; if the code were added to the standard library he could add them himself, use the generic FAILURE code; or define their own (presumably integer) flags and document them. 3.

I think a standard library should formalize its error reporting so that users have a consistent interface. As a starting point, I think that having utility routines that report the location of the errors are useful. I want to contribute to the library, but don't want each module to have its own error reporting.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/212#issuecomment-657278324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRYAUBOTHIYVDT2SVMTR3ITV3ANCNFSM4N4KVJSA .

wclodius2 commented 4 years ago

FWIW my version of gfortran (10.1.0 on the Mac) doesn't define the most recent addition to ISO_FORTRAN_ENV, STAT_FAILED_IMAGE, so an enumeration of error codes that included those defined in ISO_FORTRAN_ENV beyond the I/O ones would have to be processor specific. However, there doesn't seem to be much interest in an enumeration of error codes, whether or not they included those in ISO_FORTRAN_ENV. If there is I could post my enumeration as a starting point.

arjenmarkus commented 4 years ago

Re gfortran 10.1 not defining STAT_FAILED_IMAGE: I was afraid of that ;). And there doesn't seem to be a way to provide a fallback value.

Regards,

Arjen

Op ma 13 jul. 2020 om 15:42 schreef William B. Clodius notifications@github.com:

FWIW my version of gfortran (10.1.0 on the Mac) doesn't define the most recent addition to ISO_FORTRAN_ENV, STAT_FAILED_IMAGE, so an enumeration of error codes that included those defined in ISO_FORTRAN_ENV beyond the I/O ones would have to be processor specific. However, there doesn't seem to be much interest in an enumeration of error codes, whether or not they included those in ISO_FORTRAN_ENV. If there is I could post my enumeration.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.