Elliot's advice for using error-handling

This is just meant to share some criteria I used for converting the standard modules to error-handling. This will evolve over time, and longer term this will hopefully become a guide for when/how to use error-handling, but for now this is just to capture my rough guidelines for how I decided between error-handling or some other mechanism.

The most important thing is an unconditional halt is never really acceptable and that I tried to prefer error-handling by default. The second most important thing is that error-handling is not free in terms of execution time or burden for users so it's not always the right choice. Error-handling isn't overly expensive (an extra branch or two) but that can matter for inner kernels, especially for HPC code.

If you're doing something like file manipulation or computing dates or most anything outside a kernel, error-handling is almost certainly the right choice. These types of operations are already heavyweight so the performance overhead of error-handling is negligible.

One of the other things I considered was what other languages do for similar routines. I tended to look at Swift, Go, Rust, Python, and C/C++. Our error-handling is based on Swift so I put the most stock in them, but I also liked to get a general sense of what other languages do. Generally speaking Python is exception-throwing happy (and often use exceptions for non-exceptional things) so it's not always helpful, but it was useful to get a general sense of what users have come to expect from existing languages/APIs.

For cases where you can continue execution without correctness implications, I tended to just use warnings. e.g. if a user tried to to use a negative dataParTasksPerLocale, just emit a warning and assume we can choose the value. Or if somebody calls sleep(-1) just treat that as a no-op and emit a warning. Generally speaking I tried not to silently ignore the problem because it usually indicates a user-error, but it is something we can recover from.

For cases where you actually want to report and let users handle errors, the other main criteria I used was how frequently will a routine be called, how likely do you think it will be called with bad arguments, and is the error recoverable. If a routine is going to be called all the time, and is unlikely to have bogus values, then maybe a sentinel return value or something is appropriate.

Misc Swift guides and user experiences that I found useful:

But for all of those keep in mind that Chapel is a high performance and parallel language, so sometimes we have to make tradeoffs in the name of performance.

TODO give some concrete examples and link to PRs that converted to error-handling

chapel-lang / chapel

Elliot's advice for using error-handling #10703