proposal: errors: simplified error inspection

rogpeppe commented 5 years ago

Issue https://github.com/golang/go/issues/29934 proposes a way of inspecting errors by traversing a linked chain of errors. A significant part of the implementation of this is about to land in Go 1.13. In this document, I outline some concerns with the current design and propose a simpler alternative design.

Analysis of the current proposal

I have a few concerns about the current design.

Like @crawshaw I am concerned that the API has not been sufficiently tested yet.
Go already has a type assertion statement. The fact that such a low level package as errors needs to resort to interface{} and reflect-based magic is surely not ideal. Beginners will need to be exposed to this from a very early stage, and it's really not easy to explain.
The implementation of chained errors, with interactions between three interface types, two of which are unnamed, seems quite complex for a fundamental and low level part of the Go standard library. It means that the previously tiny errors package now depends on the fairly hefty internal/reflectlite package.
There are potentially significant runtime costs: every error inspection involves multiple traversals of a linked list and use of reflection. Any use of As implies an allocation.
the use of As and Is methods means that you can't in general ask what an error is; you can only ask whether it looks like some other error, which feels like it will make it hard to add definitive error information to log messages.
dynamic Is method dispatch makes it easy to create unintuitive relationships between errors. For example, context.ErrDeadlineExceeded "is" both os.ErrTimeout and os.ErrTemporary, but os.ErrTimeout "isn't" os.ErrTemporary. A net.OpError that wraps a timeout error "is" os.ErrTimeout but "isn't" os.ErrTemporary. This seems like a recipe for confusion to me.

Although I've been concerned for a while, I did not speak up until now because I had no alternative suggestion that was simple enough for me to be happy with.

Background

I believe that the all the complexity of the current proposal and implementation stems from one design choice: the decision to expose all errors in the chain to inspection.

If inspection only checks a single underlying error rather than a chain, the need for Is and As goes away (you can use == and .() respectively), and with them, the need for the two unnamed Is and As interface types. Inspecting an error becomes O(1) instead of O(wrapDepth).

The proposal provides the following justification:

Some error packages intend for programs to act on a single error (the “Cause”) extracted from the chain of wrapped errors. We feel that a single error is too limited a view into the error chain. More than one error might be worth examining. The errors.As function can select any error from the chain; two calls with different types can return two different errors. For instance, a program could both ask whether an error is a PathError and also ask whether it is a permission error.

It seems to me that this justification rests on shaky ground: if you know you have a PathError, then you are in a position to ask whether that PathError also contains a permission error, so this test could be written as:

if is a permission error or (is PathError and the contained error is a permission error)

This seems like it might be clumsy in practice, but there are ways of working around that (see below). My point is that any given error type is still free to expose underlying errors even though there is only one underlying error (or "Cause").

Current state

As of 2019-06-02, the As and Is primitives have been merged into the Go master branch, along with some implementations of the Is method so that OS-related errors can be compared using errors.Is. Error printing and stack frame support were merged earlier in the cycle but those changes have recently been reverted.

The xerrors package implements more of the proposed API, but is still in experimental mode.

Proposed changes to the errors package

I propose that As and Is be removed, and the following API be added to the errors package:

// Error may be implemented by an error value to signify that
// the error value is adding metadata to some underlying error
// (the "E-value").
type Error interface {
    error

    // E returns the underlying error, or E-value. If it returns
    // nil, the receiver itself should be treated as the E-value.
    //
    // Implementations should return an E-value that has no
    // underlying E-value itself, usually by storing E(err) instead
    // of err. Although technically E-values form a chain, the
    // intermediate values in the chain should never been considered
    // for inspection and the chain will almost always have length
    // 1.
    E() error
}

// E returns the "E-value" of an error - the part of the error that
// should be used for error diagnosis.
//
// The E-value, E(err), is E(err.E()) when err implements Error and
// err.E() != nil, otherwise it's err itself.
//
// When writing code that makes a decision based on an error, the
// E-value should always be used in preference to the error value
// itself, because that allows functions to add metadata to the error,
// such as extra message annotations or source location information,
// without obscuring the actual error.
func E(err error) error {
    for {
        err1, ok := err.(Error)
        if !ok {
            return err
        }
        e := err1.E()
        if e == nil {
            return err
        }
        err = e
    }
}

I've used the name E rather than Cause to emphasise the fact that we're getting the actual underlying error; the error being passed around may include more information about the error, but the E value is the only important thing for error inspection. E also reflects the T name in the testing package.

Although the E method looks superficially similar to Unwrap, it's not the same, because error wrappers don't need to preserve the error chain - they can just keep the most recent E-value of the error that's being wrapped. This means that error inspection is usually O(1). The reason for the loop inside the E function is to keep error implementations honest, to avoid confusion and to ensure idempotency: errors.E(err) will always be the same as errors.E(errors.E(err)).

This playground example contains a working version of the above package.

Proposed changes to `os` errors

The changes in this part of the proposal are orthogonal to those in the previous section. I have included this section to indicate an alternative to the current use of Is methods on types throughout the standard library, which are, it seems to me, a significant motivating factor behind the current design.

The standard library has been retrofitted with implementations of the Is method to make some error types amenable to checking with errors.Is. Of the eleven implementation of the Is method in the standard library, all but two are there to implement temporary/timeout errors, which already have an associated convention (an optional Timeout/Temporary method). This means that there are now at least two possible ways of checking for a temporary error condition: check for a Temporary method that returns true, or using errors.Is(err, os.ErrTemporary).

The historic interface-based convention for temporary and timeout errors seems sufficient now and in the future. However, it would still be convenient to have an easy way to check wrapped errors against the errors defined as global variables in the os package.

I propose that an new OSError interface with an associated Error function be added to the os package:

package os

// OSError is implemented by errors that may be associated with
// a global variable error defined in this package.
type OSError interface {
    OSError() error
}

// Error returns the underlying OS error of the
// given error, if there is one, or nil otherwise.
//
// For example, os.IsNotExist(err) is equivalent to
//     os.Error(err) == os.ErrNotExist.
func Error(err error) error {
    err1, ok := errors.E(err).(OSError)
    if ok {
        return err1.OSError()
    }
    return nil
}

Then, instead of a custom Is method, any error type that wishes to (syscall.Errno, for example) can provide an os package error by implementing the OSError method.

This domain-specific check addresses this common case without complicating the whole error inspection API. It is not as general as the current proposal's error wrapping as it focuses only on the global variable errors in os and not on the wrapper types defined that package. For example, you cannot use this convention to check if a wrapped error is a *os.PathError. However, in almost all cases, that's what you want. In the very small number of cases where you want to look for a specific wrapper type, you can still do so by manually unwrapping the specific error types via a type switch.

Note that, as with historical Go, there will still be strong conventions about what kinds of errors may be returned from which functions. When we're inspecting errors, we are not doing so blind; we're doing so knowing that an error has come from a particular source, and thus what possible values or types it may have.

Discussion

As with the current proposal, it is important that this design does not break backward compatibility. All existing errors will be returned unwrapped from the standard library, so current error inspection code will continue to work.

This proposal is not as prescriptive as the current proposal: it proposes only a method for separating error metadata from the error value used for inspection. Other decisions as to how errors might be classified are left to convention. For example, an entry point could declare that returned errors conform to the current xerrors API.

The entire world of Go does not need to converge on a single error inspection convention; on the other hand we do need some way of wrapping an error with additional metadata without compromising the ability to inspect it. This proposal provides exactly that and no more.

As an experiment, I implemented this scheme in the standard library. The changes ended up with 330 lines less code (96 lines less production code), much of it simpler.

For example, it seems to me that this code:

// OSError implements the OSError interface by returning
// the OS error for e.
func (e Errno) OSError() error {
    switch e {
    case EACCES, EPERM:
        return oserror.ErrPermission
    case EEXIST, ENOTEMPTY:
        return oserror.ErrExist
    case ENOENT:
        return oserror.ErrNotExist
    }
    return nil
}

is easier to understand than its current equivalent:

func (e Errno) Is(target error) bool {
    switch target {
    case oserror.ErrTemporary:
        return e.Temporary()
    case oserror.ErrTimeout:
        return e.Timeout()
    case oserror.ErrPermission:
        return e == EACCES || e == EPERM
    case oserror.ErrExist:
        return e == EEXIST || e == ENOTEMPTY
    case oserror.ErrNotExist:
        return e == ENOENT
    }
    return false
}

This proposal does not affect the error printing proposals, which are orthogonal and can be implemented alongside this.

Comparison with other error inspection schemes

This proposal deliberately leaves out almost all of the functionality provided by other schemes, focusing only on the ability to discard error metadata. The issue of how to inspect the E-value of an error is left to be defined by any given API.

In this way, the proposed scheme is orthogonal to other error frameworks, and thus compatible with them.

For example, although it does not directly support Unwrap-based chain inspection or error hierarchies, there is nothing stopping any given API from documenting that errors returned from that API support those kinds of error inspections, just as existing APIs document that returned errors may be specific types or values. When bridging APIs with different error conventions, it should in most cases be possible to write an adaptor from one convention to another.

The E-value definition is quite similar to the Cause definition in the errgo package, but it has one important difference - the E-value is always its own E-value, unlike Cause, which can return an error which has another Cause. This eliminates one significant source of confusion, making a strict separation between error metadata and errors intended for inspection. The error metadata can naturally still hold the linked chain of errors, including stack frame and presentation information, but this is kept separate from the E-value - it should not be considered for error inspection purposes.

Summary

This proposal outlines a much simpler scheme that focuses entirely on separating error metadata from the error inspection value, leaving everything else to API convention.

I believe there are significant issues with the current design, and I would prefer that the new errors functionality not make it into Go 1.13, to give us more time to consider our future options.

ConradIrwin commented 5 years ago

I think formatting the error is one way to provide extra context, but I tend to use tools like https://bugsnag.com to provide a much richer view on errors that happen in production.

When looking at https://github.com/golang/proposal/blob/master/design/go2draft-error-printing.md it is strange to me that the we’re requiring that the author provide both Unwrap and also Format() returning the last error.

The justification given is that library authors may decide not to allow Unwrap but still return the error from Format(), I disagree that this is desirable. In my experience it’s extremely difficult to predict what errors callers will care about, because it is environment dependent; and any information that I might use as a programmer to debug the problem should also be available at runtime to decide how to handle it. An error that when printed out shows a nested errors, but when inspected at runtime does not, looks like a bug.

If the error formatting proposal used Unwrap, then library authors would have one fewer interface to implement in the common case that they want to wrap an error and add extra information about the programs intent.

If the error formatting API didn’t need to return an error, then it could be simplified. But improvements to that proposal might be out of scope for this discussion.

creachadair commented 5 years ago

Even leaving aside the orthogonal question of rich formatting for errors, the currently-active API proposal for errors moves away from the principle that "errors are (just) values". With Is/As/Unwrap, errors are not "just values" any longer, but are complex chains of values with sophisticated behaviour.

I personally don't care that much about the philosophical implications of the difference, but I do care about the performance implications: Empirically, a lot of code seems to depend on the assumption—which held for most of Go's existence so far—that checking for specific errors on a hot path (e.g., the recurring "not found" example) is a cheap equality comparison or maybe a (single) type assertion. I have certainly written a lot of Go code with that assumption.

The Is/As/Unwrap API actively encourages nesting—and in cases where you are trying to debug an unexpected error or log details, the cost of that API may not matter since you're probably already in some bad state and are about to unwind the stack a bunch. But in the cases where nothing bad has happened, and all you want is to quickly discriminate "not found" from "permission denied", the difference is potentially substantial.

This aspect seems not to have been discussed much—and I've found it a bit tricky to get good benchmarks for the difference, in part because the new API is still too new to be widely used. I like about the E-value proposal that it better separates the concern of error discrimination from the concerns of error decoration and error introspection. This E-value proposal doesn't address all those concerns, but I think it deserves more attention because by contrast the Is/As/Unwrap proposal doesn't really allow separating them.

I think it's important to be clear that this proposal isn't about choosing between the current proposal and "not solving those other concerns". However, I haven't seen much attention paid to the cost of the API in balance to the problems it addresses.

rsc commented 5 years ago

This issue arrived after the Is/As/Unwrap proposal (#29934) was accepted. It gave us significant food for thought but ultimately did not lead the authors of that proposal to back it out. Given that Is/As/Unwrap have been accepted, it doesn't make sense to accept this one as well - they are both trying to solve the same problem. I posted more extensive comments above: three different comments starting at https://github.com/golang/go/issues/32405#issuecomment-499533864.

rsc commented 5 years ago

It seems like we should decline this proposal, given that Is/As/Unwrap was accepted. Will leave this open for a week to collect final comments.

MOZGIII commented 5 years ago

This is not bad. Now we get to actually use the language with As/Is/Unwrap applied proposal to the fullest, and if we find issues with it we can always "rebase" this proposal in the form of "fixes" to the As/Is/Unwrap. Good thing is we might not have issues that were predicted in this discussion in practice. It might be more challenging technically though - now when there’s Is/As/Unwrap in, but not impossible.

iand commented 5 years ago

Will leave this open for a week to collect final comments.

Hopefully @rogpeppe will be able to provide a response to your points in that timeframe.

rsc commented 5 years ago

Marked this last week as likely decline w/ call for last comments (https://github.com/golang/go/issues/32405#issuecomment-521001217). Declining now.

golang / go