Checked / Typed Exceptions break composition - replace with a single Runtime Exception type and more sophisticated error management

divyekapoor commented 6 months ago

I'm sorry that I'm touching on such deep aspects of language design (and especially one where you've recently made progress). However, the traps that Neat is falling into Re: error management are completely avoidable and are those that are critical for ergonomic operation of the language. (Rust fell into the trap that Neat is falling into and now they are too far along to change this design decision).

For starters, I'll point out to this blog post: https://www.divye.in/2020/06/checked-exceptions-break-composition.html

The main thesis is that checked exceptions break composition. In a single line, the issue is that when someone is writing code, they have control over their dependencies as they exist at that point in time - that is, they rely on the type signature at the time of writing the code: let's say it's <int, FileNotFoundError> and they happily handled the case. As their underlying library's dependencies evolve (note - might be 2 or 3 levels deeper), a change in a transitive dependency will implicitly change the type signature of the function. Let's say it changes implicitly to <int, FileNotFoundError, OutOfDiskError>. By the very nature of the code written (especially with the ? operator a la Rust), the type signature is actually something like <int, Error, FileNotFoundError, OutOfDiskError, ...and others>) will cause the error to propagate. Problem: local changes can propagate global changes to the type graph in a cascading way: Ugly.

In practice, every function has a bivalent implementation: f: T -> U f: Error -> Error

The first one is the happy path. The second one is the happy path for error handling (which is equally important!). After sufficient complexity and transitive dependencies, it's impossible for f to be cogently written with an error type more complex than simply std::Error - it's a maintenance nightmare with every dependency upgrade (eg. what would you do if an RPC library introduced a new error type RPCFailedDueToOutOfMemoryOnRemoteNode that is only mildly different from RPCFailedDueToSocketExhaustionOnRemoteNode and that's bubbling up to you through some code paths over which you have no control).

The introduction of a checked or typed exception system breaks the "programming in the small" vs the "programming in the large" symmetry. Every such language will force users to write code that "homogenizes" the Error types at some level of abstraction otherwise that's a loss of ergonomicity (especially with dep upgrades!). Library authors also split on Error type management (see the mess with Rust's Error type being extended with anyhow and thiserror - anyhow is essentially RuntimeException and it composes cleanly, thiserror is checked Exception and it is a mess - see the crate documentation to judge for yourself) RuntimeExceptions restore composition of functional types. So,

f: T -> T f: Error -> Error

cleanly composes with g: T -> U g: Error -> Error

(and so on... all the way down the function composition chain).

We need to look at our goals with Error handling:

Whenever there's an error, we need to incentivize the code author to write meaningful code for the error code path (in as much detail as they would for the happy path). This means making sure they add all the contextual information from the function body into the Error message for easy debugging. The ? syntax is actively harmful in this respect - it's a lazy way to say, just pass through the error even if it's meaningless (eg. FileNotFound somewhere deep in the stack, RPCFailureError without any request context etc.) and I'm not going to do the effort to attach additional context because it's hard. By using the ? syntax, we are actively discouraging writing code of the form: f: T -> Error and attaching contextual info to Error.
As a rule of ergonomicity: there should be no noise in the happy path to handle the errors (Go got it wrong with the if err := Nil noise, Rust got it wrong with all the ? and Neat is following Rust here). All of these languages conflate the f:T->T and f:T->Error paths by making error handling local at the point of dep-call when in reality, almost all error handling is non-local. Surprisingly, Java got it right - the entire main body of the function can be written assuming everything is correct and happy and the try-catch blocks provide the code for the f: T -> Error transition. (If there's an NPE, it will throw, if there's a dep error, it will propagate to catch).
The most important thing to remember is that error handling is not local. It may be local in the small and possibly recoverable, in the large though, error handling is done over a region of code and is non recoverable (retries are an approximation and log errors + metrics are the only other things that can be done). The common pattern is: log the error, increment a metric and either fail the request or retry.

How can we fix this?

The first thing to note is that if a type is a sum-type with Error, the only meaningful thing to do is to (always!) .unwrap(). If the unwrap() fails, transition to the code that implements f: Error -> Error (the catch block!). If it succeeds, continue down the happy path (the try block!). An if-check on whether the return value is an Error is useful extremely extremely rarely and that scenario can be better handled by a custom Tuple return type. This is exactly what anyhow::Error does.
Understand that the ? syntax is that it adds nothing meaningful to the control flow graph. If your dependency has yielded an error, you are going to transition into f: T -> Error and that transition will always happen. So, why bother with the ? noise - it's only for the compiler's type benefit (no human cares whether it's Error or fail and whether it should be ? or ??). The correct solution for the happy path is always to call .unwrap() on the dependency's return value. This means the bold outcome: "the language will always unwrap() at all call points and the try paths will only see the T values and the catch paths will only see the Error values".
The representational split between the try blocks and the catch blocks internally in the compiler's AST allows you to compose the two subfunctional paths and optimize / inline them. So composition becomes: f.g.h: T -> U (all the try blocks can be inlined together, chained at return points) f.g.h: Error -> Error (all the catch blocks can be inlined together chained at throw points: a throw is just a return)

The rare case of g: T -> Error -> U where the catch block actually manages a successful recovery instead of a re-throw is composed on the f.g.h: T->U path with an if-condition and interestingly, this does not break composition! (it adds an if-check similar to the existing code)

The major advantage is that, error handling becomes a joy. It's not an accident that Rust has such poor error management and that Java errors are beautiful stacktraces with lots of contextual information. The languages have made specific choices that have produced these outcomes.

In summary, the asks are:

Deprecate ? by automatically unwrapping everything (always, no exceptions - there's only 1 way to do errors).
Add try-catch support. All functions return T or throw Error (and that's it - simple and clean).
Make std::Error have a mandatory structured representation (similar to a log statement - convertible to JSON or string) and a mandatory (automatic) composition on rethrow (unless opted out). eg. throw std::Error("file not found: {}, error return: {}", filename, error); implicitly captures "caused by ... stacktrace" - backtraces are not the user's problem - it's the language + runtime's problem. (Rust's backtrace crate approach is a mess). The throw object hierarchy composes a list of JSON objects associated with stacktraces. [err1, err2, err3, ...] And any catch block can either log the entire stacktrace or a subset to a structured log via the string representation or the JSON representation.

The important part is that (3) is not expensive because there's no string concat till the final dump. It's just an alternate execution path that keeps a bunch of references around which are compatible and can be optimized. Any error handling code can pattern match against the JSON list. If you'd like to retain the type information, make it part of the error: eg. throw MyFancyError(".....", ...) transforms to throw std::Error("....", ..., context=MyFancyError) and catch blocks can be written against MyFancyError and internally, it's still always std::Error on all the types (essentially it's composition and not inheritance and certain catch blocks may use RTTI to trigger).

The structured JSON list [err1, err2, ...] homogenizes the combinatorially expansive space of errors into a uniform space that is amenable to pattern matching and syntactic sugar.

divyekapoor commented 6 months ago

I apologize for the long issue above. 5 line summary:

Most errors are non-recoverable - the default catch block is log the error and increment the failure metric.
Neat implements checked exceptions which implicitly assumes error recovery is common (as against experience (1)).
Checked exceptions are harmful because they break composition (see blog post)
The correct solution is RuntimeExceptions and try-catch blocks.
The catch blocks cleanly separate f: T->T from f: Error -> Error and the link between the two versions is throw ... which actively encourages a stacktrace capture and the capture of local relevant context. This is far superior to ? which does not encourage writing error flow code.

Continuing down the current path will mean that the community will eventually have to create the "anyhow" crate which everyone will have to agree on as the error type.

See the myriad of reasons why typed errors are bad: https://www.google.com/search?q=checked+exceptions+harmful

(the combinatorial expansion of the error space means that it's futile to assume that people are interested in writing error handling code for all the variety of fine grained ways in which a piece of code can fail - all failures should be handled relatively uniformly: a structured error object serializable into a string or a JSON list is the most uniform way to achieve this goal).

Consequence: please rework error handling to be more similar to Java. They got it right. Rust got it wrong.

FeepingCreature commented 6 months ago

it's only for the compiler's type benefit (no human cares whether it's Error or fail and whether it should be ? or ??).

Incorrect - ? is only for the reader's benefit. As you note, the compiler can figure it out just fine on its own. ? visualizes scope exits.

I understand what you mean about exceptions, but ... I really don't want to add exception handling. It's a mess from an implementation perspective. I have some plans to improve this, but I don't see it as urgent right now. Fwiw, I think I can fix this easily later because code that uses ? will always stay compatible with code that doesn't, so if this turns out to be an insurmountable issue I can always add a per-package flag for "automatic error propagation" later on. Currently written code will keep working as it currently does; that doesn't commit me to not adding exceptions later. I am aware of the problems you raise though.

FeepingCreature commented 6 months ago

To clarify, the point of the language design right now is to do simple and straightforward things. The package system should make it viable to upgrade the language incrementally in the future.

divyekapoor commented 6 months ago

Thanks for the response! Very much appreciated.

Fwiw, I think I can fix this easily later because code that uses ? will always stay compatible with code that doesn't, so if this turns out to be an insurmountable issue I can always add a per-package flag for "automatic error propagation" later on. Currently written code will keep working as it currently does; that doesn't commit me to not adding exceptions later.

The main thing is that ? is harmful. Actively so. It encourages poor code and poor libraries. It's a degraded version of a re-throw-without-context and once the ecosystem adopts Result<T, Error>, there's not a lot of going back. I see that you understand this. I will rest my case.

If ? is to be part of the language, then at-least force a context capture and serialization of all the function parameters and the line-number of the return point as part of the exception path. This too is non-trivial work, though it's better than the status quo. The outcome is going to be "beautiful error messages and stacktraces" of a depth that even Java can't match - all by default.

Thanks for considering the above. Please feel free to close out this ticket.

Cheers!

FeepingCreature commented 6 months ago

My plan (as required) is at some point to change ? to inject local site information into the error. Or just have Error in general accumulate local information on return.

Thanks for the feedback!

divyekapoor commented 6 months ago

My plan (as required) is at some point to change ? to inject local site information into the error. Or just have Error in general accumulate local information on return.

Sounds good. One point I’ll make about this proposed approach - it will require every single object in the codebase to be serializable to String. Please make sure this is a language feature by default and not an add on like the Rust serde crate. For structs and classes, it will require an implicit codegen similar to #derive[Debug] on every struct. Rust got it wrong by making this optional (making all structs noisy and of variable quality wrt debugging and printing).

Thank you once again for bearing with me. I wish you all the best with Neat. It shows promise.

Neat-Lang / neat

Checked / Typed Exceptions break composition - replace with a single Runtime Exception type and more sophisticated error management #34