Open nshaffer opened 4 years ago
Very nice description and, FWIW, I prefer approach 3 as well.
One problem that I have faced a few times in the design of such routines is how this API communicates when nested at multiple levels of depth. Say the user is running a numerical integrator within a root-finding problem that is, in turn, used to solve a system of a differential equations. We might have three levels of depth in this case (admittedly, probably rare and fictitious), but I have personally used (within the scope of stdlib
) two levels of depth in several cases. I'd guess OO design would help in this sense (with separate instances of the "error handler class") but I know it is discouraged in stdlib
.
One thing that we should try to avoid is to force every user to be checking every single function from stdlib to ensure it succeeded. We should figure out how to allow users to check it if they want to and handle the error themselves, but we should also allow users to just call a given stdlib
function and it would fail with a nice error message if something goes wrong.
So I would say a combination of 3. and 1. is the way to go.
The option 2. should be combined with the option 1. I.e., if the optional stat
argument is not present, then it will fail with error stop
. If stat
is present, it would allow the user to handle the error. This is great for functions such as open
or allocate
.
The option 3. is great for functions such as find
to find an element in an array or a string, and return -1 if it does not exist.
Thank you for these clear propositions. It is related to #95 too.
The options 1., 2., and 3. could be used for different purposes/procedures. For example, returning -1
or even NaN
for one of the statistical procedure when a dim
> 15 is provided would make no sense, and would not help the user. In such a case, I think it is better that the program fails with error stop
and an informative message.
There are two related questions raised by @nshaffer original post:
To address the first question I have written a module, that I have tentatively named STDLIB_ERROR_REPORTING, and a markdown document describing the module's API in some detail. The module consists of a number of named constants and five subroutines.
The module defines 80+ unique named constants to serve as error codes for reporting errors in a consistent manner. The constants can be defined as integers or a derived type, but are currently defined as a derived type named ERRORS
, whose sole public component is named CODE
. The advantages and disadvantages of the different means of defining the constants are:
Integers a. Advantage:
* It is what the Fortran standard currently uses for reporting errors, and is what its users are used to,
* It requires less typing to enter `INTEGER` than to enter `TYPE(NAME)`, and
* They can easily be used in a `SELECT CASE` construct
b. Disadvantage: the users may be tempted to use the processor's STATUS
values as their flags.
* the `STATUS` values the processor returns may duplicate values for the error codes, but with very different meanings
* the meanings of the processor's `STATUS` values are not readily apparent
Derived type with a public component:
a. Advantage:
* The name of the type can be intuitive
* With a public component it can be used in `SELECT CASE` constructs
* More difficult to confuse with `STATUS` values
b. Disadvantage:
* It is not what users are used to
* `TYPE(NAME)` requires more typing than `INTEGER`
Derived type with a private component
a. Advantage
* The name of the type can be intuitive
* Very difficult to confuse with STATUS values
b. Disadvantage
* It is not what users are used to
* Cannot be used in `SELECT CASE` constructs
* `TYPE(NAME)` requires more typing than `INTEGER`
Four of the five subroutines have ERROR
as an optional argument for passing error codes. For the fifth ERROR
is mandatory. Three subroutines, have optional MODULE
and PROCEDURE
arguments for specifying the location where the error was discovered. One of the procedures, ASSERT
, is similar in intent to CHECK
in STDLIB_ERROR
, while another, SEND_STOP
, is similar in intent to ERROR_STOP
in the same module. A summary of the subroutines is as follows:
ASSERT( TEST, MESSAGE [, MODULE, PROCEDURE, ERROR ] ): If TEST
is .TRUE.
does nothing, otherwise it writes text to the ERROR_UNIT
, with an ERROR
dependent string as the stop code.
REPORT_ERROR( MESSAGE [, MODULE, PROCEDURE, STAT, ERRMSG, ERROR ] ): Writes the character string, MESSAGE
, to ERROR_UNIT
, with an ERROR
dependent string as the stop code.
REPORT_IO_ERROR( MESSAGE [, MODULE, PROCEDURE, IOSTAT, IOMSG, ERROR ] ): Writes the character string, MESSAGE
, to ERROR_UNIT
, with an ERROR
dependent string as the stop code.
REPORT_TEXT_ERROR( LINE, START_INDEX, DESCRIP [, FILENAME, LINE_NUMBER, WRITE_UNIT, ERROR ] ): Sends a message to WRITE_UNIT
describing an error found in a line of text.
SEND_STOP( ERROR ): Stops processing with an ERROR
specific string as the stop code. Note this subroutine would be PURE
in F2018, but is not PURE
in F2008.
As to the second question as to what guidelines programmers should follow I have a few strong opinions on user and memory allocation errors and pure procedures. Programmers should not report user errors up the call chain. They should simply stop with an informative message, or if purity or elemental attributes are important and F2008 compatibility is important, they should do nothing at all. Programmers should almost never attempt to handle memory allocation problems. With virtual memory, allocation should almost never fail, and with lazy allocation on Linux, failure need not be discovered at the invocation of an ALLOCATE
statement. They should only attempt to handle an allocation error if:
ALLOCATE
statement.Finally as to purity, there is currently nothing to be done to handle errors in F2018 for ELEMENTAL
procedures. For PURE procedures in 2018 you can invoke ERROR STOP
, or pass a STATUS
flag. For the nominal, elemental factorial function with a negative argument, I would return a NaN.
FWIW here is some example code illustrating how I view the module being used:
MODULE EXAMPLE_MOD
...
USE STDLIB_ERROR_REPORTING
...
MODULE_NAME = 'EXAMPLE_MOD'
...
CONTAINS
...
SUBROUTINE EXAMPLE_SUB(..., STATUS, ... )
...
TYPE(ERRORS), INTENT(OUT), OPTIONAL :: STATUS
...
SELECT CASE (FLAG)
...
CASE DEFAULT
IF ( PRESENT(STATUS) ) THEN
STATUS = CONSISTENCY_FAULT
RETURN
ELSE
CALL REPORT_ERROR('Illegal FLAG value.', &
MODULE = MODULE_NAME, &
PROCEDURE = 'EXAMPLE_SUB', &
ERROR = CONSISTENCY_FAULT )
END IF
END SELECT
...
OPEN ( UNIT=LUN, FILE=FILENAME, STATUS='OLD', &
ERR=99, IOSTAT=STAT, IOMSG=MESSAGE )
...
99 IF (PRESENT(STATUS)) THEN
STATUS = OPEN_FAULT
RETURN
ELSE
CALL REPORT_IO_ERROR ( TRIM(FILENAME) // &
" could not be opened as 'OLD'.", &
MODULE = MODULE_NAME, &
PROCEDURE = 'EXAMPLE_SUB', &
IOSTAT = STAT, &
IOMSG = MESSAGE, &
ERROR = OPEN_FAULT )
END IF
...
100 READ ( UNIT=LUN, END=190, EOR=192, ERR=194,&
IOMSG=IOMSG, IOSTAT=STAT ) A
...
IF ( PRESENT(STATUS) ) THEN
STATUS = SUCCESS
END IF
RETURN
...
190 IF ( PRESENT(STATUS) ) THEN
STATUS = EOF_FAULT
RETURN
END IF
CALL REPORT_IO_ERROR("End-of-File occurred"//&
"reading from file '"//TRIM(FILENAME)//"'.",&
MODULE = MODULE_NAME, &
PROCEDURE = 'EXAMPLE_SUB', &
IOSTAT = STAT, &
IOMSG = MESSAGE, &
ERROR = EOF_FAULT )
192 IF ( PRESENT(STATUS) ) THEN
STATUS = EOR_FAULT
RETURN
END IF
CALL REPORT_IO_ERROR("End-of-Record occurred reading "//&
"from file '"//TRIM(FILENAME)//"'.", &
MODULE = MODULE_NAME, &
PROCEDURE = 'EXAMPLE_SUB', &
IOSTAT = STAT, &
IOMSG = MESSAGE, &
ERROR = EOR_FAULT )
194 IF ( PRESENT(STATUS) ) THEN
STATUS = READ_FAULT
RETURN
END IF
CALL REPORT_IO_ERROR("Read fault occurred reading "//&
"from file '"//TRIM(FILENAME)//"'.", &
MODULE = MODULE_NAME, &
PROCEDURE = 'EXAMPLE_SUB', &
IOSTAT = STAT, &
IOMSG = MESSAGE, &
ERROR = READ_FAULT )
END SUBROUTINE EXAMPLE_SUB
...
END MODULE EXAMPLE_MOD
@wclodius2 Thank you for the carefully considered post. I hope you can clarify a few things for me about your approach.
I do not fully understand your comments on processor-defined status values, e.g., for the stat=
specifier in allocate
statements or the iostat=
specifier of I/O statements. At least in the case of iostat
values, the named constants in iso_fortran_env
seem quite suitable.
In your example, is it intended that users only test for error codes that they expect to handle and let all others fall to the default
case? What do they do if they think none of the predefined error codes makes sense for them?
It is not clear to me what is gained from this approach compared to defining and testing for my own status codes. It provides pre-defined status codes and a few utility routines, but does this facilitate error handling in user code or just formalize it?
@wclodius2 , your example usage looks very similar to a library I developed: erloff
I don't have the routines that output the errors and stop the program, but those are easy to add.
@nshaffer thank you for your kind words.
Thank you for reminding me of the error flags in ISO_FORTRAN_ENV. It has given me a few more error codes to add to my initial list of eighty two. The one problem with the ISO_FORTRAN_ENV flags is that their values can vary arbitrarily between processors, so if you want to pass status flags for multiple categories of problems, ensuring that your values are different for each cause requires replacing their values with your own any way.
It is expected that the writer of the routine that reports the errors to the user documents what errors it passes, that the user decides whether he wants to handle any of them or let failure occur in the called routine, and that the user decide which errors he wants to handle or let fail in the calling routine. I have tried to be very thorough in identifying categories of errors. If a user decides that none of the codes are suitable they can do any of: ask me to add an appropriate error code; if the code were added to the standard library he could add them himself, use the generic FAILURE code; or define their own (presumably integer) flags and document them.
I think a standard library should formalize its error reporting so that users have a consistent interface. As a starting point, I think that having utility routines that report the location of the errors are useful. I want to contribute to the library, but don't want each module to have its own error reporting.
@everythingfunctional I'm glad I am not the only one that approaches error handling this way.
Wrt the error codes defined in ISO_FORTRAN_ENV we could use code like this to ensure that the error codes in the library do not interfere with the predefined ones for any processor:
! errorcodes.f90 -- ! Use simple calculations to ensure a range outside all predefined error codes ! program errorcodes use iso_fortran_env
integer, parameter :: error_base = 1 + max(abs(iostat_end),
abs(iostat_eor), abs(iostat_inquire_internalunit) ) ! TODO: add the STAT* codes
write(*,*) error_base
end program errorcodes
Out of laziness I have left out the STAT_* error codes :). The program is accepted by both gfortran and Intel Fortran and produces two very different values, 5019 and 91, respectively.
Regards,
Arjen
Op zo 12 jul. 2020 om 23:32 schreef William B. Clodius < notifications@github.com>:
@nshaffer https://github.com/nshaffer thank you for your kind words.
1.
Thank you for reminding me of the error flags in ISO_FORTRAN_ENV. It has given me a few more error codes to add to my initial list of eighty two. The one problem with the ISO_FORTRAN_ENV flags is that their values can vary arbitrarily between processors, so if you want to pass status flags for multiple categories of problems, ensuring that your values are different for each cause requires replacing their values with your own any way. 2.
It is expected that the writer of the routine that reports the errors to the user documents what errors it passes, that the user decides whether he wants to handle any of them or let failure occur in the called routine, and that the user decide which errors he wants to handle or let fail in the calling routine. I have tried to be very thorough in identifying categories of errors. If a user decides that none of the codes are suitable they can do any of: ask me to add an appropriate error code; if the code were added to the standard library he could add them himself, use the generic FAILURE code; or define their own (presumably integer) flags and document them. 3.
I think a standard library should formalize its error reporting so that users have a consistent interface. As a starting point, I think that having utility routines that report the location of the errors are useful. I want to contribute to the library, but don't want each module to have its own error reporting.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/212#issuecomment-657278324, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRYAUBOTHIYVDT2SVMTR3ITV3ANCNFSM4N4KVJSA .
FWIW my version of gfortran (10.1.0 on the Mac) doesn't define the most recent addition to ISO_FORTRAN_ENV, STAT_FAILED_IMAGE, so an enumeration of error codes that included those defined in ISO_FORTRAN_ENV beyond the I/O ones would have to be processor specific. However, there doesn't seem to be much interest in an enumeration of error codes, whether or not they included those in ISO_FORTRAN_ENV. If there is I could post my enumeration as a starting point.
Re gfortran 10.1 not defining STAT_FAILED_IMAGE: I was afraid of that ;). And there doesn't seem to be a way to provide a fallback value.
Regards,
Arjen
Op ma 13 jul. 2020 om 15:42 schreef William B. Clodius notifications@github.com:
FWIW my version of gfortran (10.1.0 on the Mac) doesn't define the most recent addition to ISO_FORTRAN_ENV, STAT_FAILED_IMAGE, so an enumeration of error codes that included those defined in ISO_FORTRAN_ENV beyond the I/O ones would have to be processor specific. However, there doesn't seem to be much interest in an enumeration of error codes, whether or not they included those in ISO_FORTRAN_ENV. If there is I could post my enumeration.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
This issue is to discuss approaches to handle runtime errors and exceptions in stdlib. I have in mind scenarios such as
These are conditions which cannot be identified at compile time, but if left alone will either cause user code to crash or be erroneous. Currently in stdlib, we have
check
, which can print error messages and optionally terminate execution. I think it is not sufficient for general-purpose runtime checking. To me, there are a few major considerations when thinking about runtime checkingUsers must be given the opportunity to recover from the error if at all possible. This is especially important for library code, which may be difficult to debug depending on the installation/distribution. It is also generally rude for library code to kill execution without giving the user any say in the matter.
It should be possible to recover from runtime errors without sacrificing purity. If a routine is manifestly pure, then it should not have to sacrifice the
pure
attribute just to have some error checking. If I write a factorial function, making itpure
and handling negative arguments should not have to be an either/or proposition.Checking and handling errors should not be unduly burdensome to users. If a function call requires one line, and handling its possible error conditions requires ten, users will simply not bother with error checking.
This list of criteria is not exhaustive, but they are the three that are most important to me. That stated, here are the runtime checking approaches I am most familiar with and how they stack up:
Return error code and message as optional out-params.
iostat
,iomsg
)Die with
error stop
Return a special value
index
returning -1Raise an exception
Of these, my preference is strongly toward approach 3 whenever possible.