golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.97k stars 17.67k forks source link

proposal: errors: simplified error inspection #32405

Closed rogpeppe closed 5 years ago

rogpeppe commented 5 years ago

Issue https://github.com/golang/go/issues/29934 proposes a way of inspecting errors by traversing a linked chain of errors. A significant part of the implementation of this is about to land in Go 1.13. In this document, I outline some concerns with the current design and propose a simpler alternative design.

Analysis of the current proposal

I have a few concerns about the current design.

Although I've been concerned for a while, I did not speak up until now because I had no alternative suggestion that was simple enough for me to be happy with.

Background

I believe that the all the complexity of the current proposal and implementation stems from one design choice: the decision to expose all errors in the chain to inspection.

If inspection only checks a single underlying error rather than a chain, the need for Is and As goes away (you can use == and .() respectively), and with them, the need for the two unnamed Is and As interface types. Inspecting an error becomes O(1) instead of O(wrapDepth).

The proposal provides the following justification:

Some error packages intend for programs to act on a single error (the “Cause”) extracted from the chain of wrapped errors. We feel that a single error is too limited a view into the error chain. More than one error might be worth examining. The errors.As function can select any error from the chain; two calls with different types can return two different errors. For instance, a program could both ask whether an error is a PathError and also ask whether it is a permission error.

It seems to me that this justification rests on shaky ground: if you know you have a PathError, then you are in a position to ask whether that PathError also contains a permission error, so this test could be written as:

if is a permission error or (is PathError and the contained error is a permission error)

This seems like it might be clumsy in practice, but there are ways of working around that (see below). My point is that any given error type is still free to expose underlying errors even though there is only one underlying error (or "Cause").

Current state

As of 2019-06-02, the As and Is primitives have been merged into the Go master branch, along with some implementations of the Is method so that OS-related errors can be compared using errors.Is. Error printing and stack frame support were merged earlier in the cycle but those changes have recently been reverted.

The xerrors package implements more of the proposed API, but is still in experimental mode.

Proposed changes to the errors package

I propose that As and Is be removed, and the following API be added to the errors package:

// Error may be implemented by an error value to signify that
// the error value is adding metadata to some underlying error
// (the "E-value").
type Error interface {
    error

    // E returns the underlying error, or E-value. If it returns
    // nil, the receiver itself should be treated as the E-value.
    //
    // Implementations should return an E-value that has no
    // underlying E-value itself, usually by storing E(err) instead
    // of err. Although technically E-values form a chain, the
    // intermediate values in the chain should never been considered
    // for inspection and the chain will almost always have length
    // 1.
    E() error
}

// E returns the "E-value" of an error - the part of the error that
// should be used for error diagnosis.
//
// The E-value, E(err), is E(err.E()) when err implements Error and
// err.E() != nil, otherwise it's err itself.
//
// When writing code that makes a decision based on an error, the
// E-value should always be used in preference to the error value
// itself, because that allows functions to add metadata to the error,
// such as extra message annotations or source location information,
// without obscuring the actual error.
func E(err error) error {
    for {
        err1, ok := err.(Error)
        if !ok {
            return err
        }
        e := err1.E()
        if e == nil {
            return err
        }
        err = e
    }
}

I've used the name E rather than Cause to emphasise the fact that we're getting the actual underlying error; the error being passed around may include more information about the error, but the E value is the only important thing for error inspection. E also reflects the T name in the testing package.

Although the E method looks superficially similar to Unwrap, it's not the same, because error wrappers don't need to preserve the error chain - they can just keep the most recent E-value of the error that's being wrapped. This means that error inspection is usually O(1). The reason for the loop inside the E function is to keep error implementations honest, to avoid confusion and to ensure idempotency: errors.E(err) will always be the same as errors.E(errors.E(err)).

This playground example contains a working version of the above package.

Proposed changes to os errors

The changes in this part of the proposal are orthogonal to those in the previous section. I have included this section to indicate an alternative to the current use of Is methods on types throughout the standard library, which are, it seems to me, a significant motivating factor behind the current design.

The standard library has been retrofitted with implementations of the Is method to make some error types amenable to checking with errors.Is. Of the eleven implementation of the Is method in the standard library, all but two are there to implement temporary/timeout errors, which already have an associated convention (an optional Timeout/Temporary method). This means that there are now at least two possible ways of checking for a temporary error condition: check for a Temporary method that returns true, or using errors.Is(err, os.ErrTemporary).

The historic interface-based convention for temporary and timeout errors seems sufficient now and in the future. However, it would still be convenient to have an easy way to check wrapped errors against the errors defined as global variables in the os package.

I propose that an new OSError interface with an associated Error function be added to the os package:

package os

// OSError is implemented by errors that may be associated with
// a global variable error defined in this package.
type OSError interface {
    OSError() error
}

// Error returns the underlying OS error of the
// given error, if there is one, or nil otherwise.
//
// For example, os.IsNotExist(err) is equivalent to
//     os.Error(err) == os.ErrNotExist.
func Error(err error) error {
    err1, ok := errors.E(err).(OSError)
    if ok {
        return err1.OSError()
    }
    return nil
}

Then, instead of a custom Is method, any error type that wishes to (syscall.Errno, for example) can provide an os package error by implementing the OSError method.

This domain-specific check addresses this common case without complicating the whole error inspection API. It is not as general as the current proposal's error wrapping as it focuses only on the global variable errors in os and not on the wrapper types defined that package. For example, you cannot use this convention to check if a wrapped error is a *os.PathError. However, in almost all cases, that's what you want. In the very small number of cases where you want to look for a specific wrapper type, you can still do so by manually unwrapping the specific error types via a type switch.

Note that, as with historical Go, there will still be strong conventions about what kinds of errors may be returned from which functions. When we're inspecting errors, we are not doing so blind; we're doing so knowing that an error has come from a particular source, and thus what possible values or types it may have.

Discussion

As with the current proposal, it is important that this design does not break backward compatibility. All existing errors will be returned unwrapped from the standard library, so current error inspection code will continue to work.

This proposal is not as prescriptive as the current proposal: it proposes only a method for separating error metadata from the error value used for inspection. Other decisions as to how errors might be classified are left to convention. For example, an entry point could declare that returned errors conform to the current xerrors API.

The entire world of Go does not need to converge on a single error inspection convention; on the other hand we do need some way of wrapping an error with additional metadata without compromising the ability to inspect it. This proposal provides exactly that and no more.

As an experiment, I implemented this scheme in the standard library. The changes ended up with 330 lines less code (96 lines less production code), much of it simpler.

For example, it seems to me that this code:

// OSError implements the OSError interface by returning
// the OS error for e.
func (e Errno) OSError() error {
    switch e {
    case EACCES, EPERM:
        return oserror.ErrPermission
    case EEXIST, ENOTEMPTY:
        return oserror.ErrExist
    case ENOENT:
        return oserror.ErrNotExist
    }
    return nil
}

is easier to understand than its current equivalent:

func (e Errno) Is(target error) bool {
    switch target {
    case oserror.ErrTemporary:
        return e.Temporary()
    case oserror.ErrTimeout:
        return e.Timeout()
    case oserror.ErrPermission:
        return e == EACCES || e == EPERM
    case oserror.ErrExist:
        return e == EEXIST || e == ENOTEMPTY
    case oserror.ErrNotExist:
        return e == ENOENT
    }
    return false
}

This proposal does not affect the error printing proposals, which are orthogonal and can be implemented alongside this.

Comparison with other error inspection schemes

This proposal deliberately leaves out almost all of the functionality provided by other schemes, focusing only on the ability to discard error metadata. The issue of how to inspect the E-value of an error is left to be defined by any given API.

In this way, the proposed scheme is orthogonal to other error frameworks, and thus compatible with them.

For example, although it does not directly support Unwrap-based chain inspection or error hierarchies, there is nothing stopping any given API from documenting that errors returned from that API support those kinds of error inspections, just as existing APIs document that returned errors may be specific types or values. When bridging APIs with different error conventions, it should in most cases be possible to write an adaptor from one convention to another.

The E-value definition is quite similar to the Cause definition in the errgo package, but it has one important difference - the E-value is always its own E-value, unlike Cause, which can return an error which has another Cause. This eliminates one significant source of confusion, making a strict separation between error metadata and errors intended for inspection. The error metadata can naturally still hold the linked chain of errors, including stack frame and presentation information, but this is kept separate from the E-value - it should not be considered for error inspection purposes.

Summary

This proposal outlines a much simpler scheme that focuses entirely on separating error metadata from the error inspection value, leaving everything else to API convention.

I believe there are significant issues with the current design, and I would prefer that the new errors functionality not make it into Go 1.13, to give us more time to consider our future options.

bcmills commented 5 years ago

CC @neild @jba

earthboundkid commented 5 years ago

Are the semantics of E() as you propose them the same as pkg/errors.Cause()? If so, why not reuse the method? If not, how do they differ?

rogpeppe commented 5 years ago

@carlmjohnson No, the semantics are different, though not dissimilar. That Cause function returns nil if there's an error with a Cause method that returns nil, which means that errors.Cause of a non-nil error can return nil, which errors.E will not.

Also, the associated conventions are different. In pkg/errors, the Cause chain is used for both error diagnosis and for error metadata (e.g. stack frame information). This proposal keeps the two concepts entirely separate, and does not try to define how metadata is represented. The existing proposal for error formatting is still applicable.

Furthermore, the term Cause is used in different ways in several existing packages, all with subtly different semantics. I thought it was worth using a method with a distinctly different name so that the semantics and conventions are clear and it can be retrofitted to existing code without significant friction.

This was a good question BTW. I should mention this difference in the "comparison" section.

networkimprov commented 5 years ago

@rogpeppe, you might want to drop a note about this issue on golang-dev.

kardianos commented 5 years ago

@rogpeppe

There are potentially significant runtime costs: every error inspection involves multiple traversals of a linked list and use of reflection.

This is true, but I don't typically see this type of error inspection on the hot path.

the use of As and Is methods means that you can't in general ask what an error is; you can only ask whether it looks like some other error, which feels like it will make it hard to add definitive error information to log messages.

If you log an error, call err.Error(). If you want to test if an error is something to do a specific action, then use "Is" or "As". It isn't clear to me what you mean by "definitive". I don't consider the initial error to be "definitive"; the initial error typically has too little information to be useful.

dynamic Is method dispatch makes it easy to create unintuitive relationships between errors. For example, context.ErrDeadlineExceeded "is" both os.ErrTimeout and os.ErrTemporary, but os.ErrTimeout "isn't" os.ErrTemporary. A net.OpError that wraps a timeout error "is" os.ErrTimeout but "isn't" os.ErrTemporary. This seems like a recipe for confusion to me.

I don't understand why this is confusing. Could you clarify?

Here is my use case, why I appreciate the current 1.13 implementation, and where your proposal would fall short:

My applications work with databases a lot. This interaction can have errors at multiple places:

I also tend to use helper functions that may combine information from the SQL text with the SQL error to extract and highlight the line of the error. In this case the SQL driver that handles the error condition won't (and shouldn't) have access to the SQL text. This helper function will wrap the an error with additional query context information.

Furthermore, almost all SQL errors (of all types) are used for business logic and thus sometimes errors have distinct business logic ramifications that may need to be tested for.

So given the above, what would be the "cause", or what would the "definitive" error? It depends on what I'm doing. I sometimes want to look at the "cause" (maybe invalid SQL), but other times I need to see what business logic error and take some action based on that (maybe failed to fetch order).

The current proposal satisfies all these needs with very little code change. This will be much easier to explain to most new Go developers and is similar to what they may be used to from Java or C# with multiple catch statements. After reading your proposal and playground implementation, I don't think it would solve the issues my code bases have with error handling today.

rogpeppe commented 5 years ago

@kardianos

There are potentially significant runtime costs: every error inspection involves multiple traversals of a linked list and use of reflection.

This is true, but I don't typically see this type of error inspection on the hot path.

I think that may depend significantly on the application. Some error kinds (for example "not found" errors) can be very frequent.

the use of As and Is methods means that you can't in general ask what an error is; you can only ask whether it looks like some other error, which feels like it will make it hard to add definitive error information to log messages.

If you log an error, call err.Error(). If you want to test if an error is something to do a specific action, then use "Is" or "As". It isn't clear to me what you mean by "definitive". I don't consider the initial error to be "definitive"; the initial error typically has too little information to be useful.

If I want to understand why my program is behaving in a specific way when handling errors, I can't always stop it at runtime and ask "Is" or "As" questions of the error value. Often I'll need to create a log message that provides some reasonable post-mortem view of the error. With the current proposal, that's not easy to do, because the entire chain of errors contributes to that decision-making process, including dynamic method dispatch to As and Is methods in the chain. Why isn't my code responding to this "As" error correctly? I need to look at the types and values of all the errors in the chain and understand what dynamic "As" checks each one implements.

dynamic Is method dispatch makes it easy to create unintuitive relationships between errors. For example, context.ErrDeadlineExceeded "is" both os.ErrTimeout and os.ErrTemporary, but os.ErrTimeout "isn't" os.ErrTemporary. A net.OpError that wraps a timeout error "is" os.ErrTimeout but "isn't" os.ErrTemporary. This seems like a recipe for confusion to me.

I don't understand why this is confusing. Could you clarify?

"Is" feels like a kind of equality operator. This relationship sounds like A=B and A=C but B ≠ C, which is confusing to me at least.

Here is my use case, why I appreciate the current 1.13 implementation, and where your proposal would fall short:

My applications work with databases a lot. This interaction can have errors at multiple places:

  • Temporary conditions within the database itself (locked table).
  • Invalid SQL.
  • Network partitions and invalid configurations.
  • Errors thrown from SQL procedures.

I also tend to use helper functions that may combine information from the SQL text with the SQL error to extract and highlight the line of the error. In this case the SQL driver that handles the error condition won't (and shouldn't) have access to the SQL text. This helper function will wrap the an error with additional query context information.

Furthermore, almost all SQL errors (of all types) are used for business logic and thus sometimes errors have distinct business logic ramifications that may need to be tested for.

So given the above, what would be the "cause", or what would the "definitive" error? It depends on what I'm doing. I sometimes want to look at the "cause" (maybe invalid SQL), but other times I need to see what business logic error and take some action based on that (maybe failed to fetch order).

The current proposal satisfies all these needs with very little code change. This will be much easier to explain to most new Go developers and is similar to what they may be used to from Java or C# with multiple catch statements. After reading your proposal and playground implementation, I don't think it would solve the issues my code bases have with error handling today.

It seems to me like you're describing an error domain here. It would be up to you (or someone else, of course) to define the conventions for the way that errors work in that domain. This is not unrelated to the approach I've suggested for OS-related errors in the proposal above.

For example, you could define an interface that encapsulates information about an SQL error:

type SQLError interface {
    // DriverErr holds the driver-specific information about the SQL error encountered.
    // Alternatively, you might want this to return a more driver-specific error, such as *pq.Error.
    DriverErr() error
    // SQLText holds the text of the SQL statement that failed. This may return the
    // empty string if the text was not known when the error was returned.
    SQLText() string
}

Then you can implement specific types to represent specific business logic errors, each of which could implement SQLError to make available the underlying SQL error that caused the business logic error. Although, to be honest, if you've diagnosed a business logic error, you probably don't care about the underlying SQL error any more, so it probably wouldn't be that useful to include it.

In your code, you could do:

switch err := errors.E(err).(type) {
case *OrderNotFoundError:
    // Fields in OrderNotFound error could hold info
    // on the underlying SQL error or anything else.
    return http.StatusNotFound
case SQLError:
    // inspect err.DriverErr and SQLText as desired
    return http.StatusInternalServerError
}

The point I'm trying to make with this proposal is that there doesn't have to be a one-size-fits-all approach to error inspection. The only common aspect to all error handling schemes is that we need to be able to add metadata without affecting the error value that we use for making decisions, which is what I've proposed.

I suggest that it's actually quite rare to encounter a situation where we are really mixing and matching a bunch of different kinds of errors from different layers, which is what the current proposal facilitates. Instead, this proposal favours an encapsulation approach where each layer makes its own decision as to what conventions to support. Those conventions can and should be mentioned in the API documentation so the caller knows what kinds of errors to expect and can check accordingly.

It's my intuition that this approach has nicer compositional properties than the current proposal and will scale better to large code bases, making large-scale refactoring less error-prone due to less unexpected interactions between errors in different layers. Not to mention that's it's much simpler. :)

kardianos commented 5 years ago

"Is" feels like a kind of equality operator. This relationship sounds like A=B and A=C but B ≠ C, which is confusing to me at least.


A = io.ReadWriter
B = io.Reader
C = io.Writer

A is B A is C B is not C


This is not confusing to me.

I understand your design rational I believe. It is compelling.

> I suggest that it's actually quite rare to encounter a situation where we are really mixing and matching a bunch of different kinds of errors from different layers, which is what the current proposal facilitates.

I suspect you may underestimate this, or the cost of refactoring other layers / shims to make this work over time.
josharian commented 5 years ago

A is B A is C B is not C

A is a B. A is a C. B is not a C.

“is” is a different relationship than “is a”.

MOZGIII commented 5 years ago

The core idea about just adding metadata without actually changing the actionable error is captured pretty good with this proposal.

However, from a tooling perspective, the implementation doesn't look complete.

From errors.E:

// When writing code that makes a decision based on an error, the
// E-value should always be used in preference to the error value
// itself, because that allows functions to add metadata to the error,
// such as extra message annotations or source location information,
// without obscuring the actual error.

With this in mind, if I have an intermediate value that wraps the raw error with a higher level error, that in turn is also wrapped again, how would I check for that middle error?

I can assert the top-level error with the error value, and the bottom-most with E-value, but to assert on mid-level (intermediate) error there seems to be no way around but to write custom logic for it.

While I agree that current proposal is too complicated (I fully agree that debugging the errors with dynamic dispatch does sound great - it's a solid argument against it) - this one seems to be lacking parts of an API to make life easier - like build-in support for walking around the error chain. Although, it's unclear if it's actually a requirement - maybe it's a good idea to force people to write their own code for that.

creachadair commented 5 years ago

I suggest that it's actually quite rare to encounter a situation where we are really mixing and matching a bunch of different kinds of errors from different layers, which is what the current proposal facilitates.

I suspect you may underestimate this, or the cost of refactoring other layers / shims to make this work over time.

That may be so. However, I think in practice the cases where this level of detail are needed are comparatively uncommon, and while we shouldn't prevent those domains from addressing their issues, I wonder whether it makes sense for the whole Go ecosystem to pay in a more complex API.

Domains where layered inspection is needed will need to define shared abstractions for errors in any case, even with proposal #29934. Some attractive aspects of #32405 here are a smaller, conceptually simpler API, and better separation of concerns: Cases where a chain of errors is really needed can still have their way, but my experience of using the xerrors prototype so far has been that the majority of cases mostly just walk to the end of the chain anyway.

I'd argue that it's preferable to make APIs that really want the complexity to pay the cost for it—and I think the cost wouldn't be that high, relative to an API big enough to care. The design proposal advocated chaining in a fairly general way ("more than one error might be worth examining") without motivating examples.

rogpeppe commented 5 years ago

@MOZGIII

With this in mind, if I have an intermediate value that wraps the raw error with a higher level error, that in turn is also wrapped again, how would I check for that middle error?

If an error is important enough to look at, it is an E-value (i.e. it has no chained E-value below it). If some other error wraps it in the way you're talking about, then that's an E-value too. The E property is not about wrapping - it's about adding metadata. By metadata, I mean extra data that programs should not inspect to determine behaviour, such as stack frame information or the error string. If an error e isn't an E-value, it's just metadata and can always be discarded in favour of errors.E(e) when making programmatic decisions.

When you say that an error A "wraps" another value B, I would say that you have one error, A(B), not two independent errors, (A, B). So in your case, if you really have three significant layers of wrapping, you'll have A(B(C)), not (A, B, C), and if you want to look for the middle one, error A will have to make its wrapped error B available explicitly (that's what the OSError proposal and the SQLError suggestions above both do).

I hope that makes things a little bit clearer than mud :)

rogpeppe commented 5 years ago

@kardianos

I suggest that it's actually quite rare to encounter a situation where we are really mixing and matching a bunch of different kinds of errors from different layers, which is what the current proposal facilitates.

I suspect you may underestimate this, or the cost of refactoring other layers / shims to make this work over time.

You may be right. I think it's worth taking that risk. FWIW we are already in that situation, and I don't see that things are working too badly.

I'm much more concerned about creeping layer interconnection and increasing fragility in large-scale programs than I am about this aspect tbh. YMMV of course.

MOZGIII commented 5 years ago

So, that means I'd have to make my intermediate error an E-value (with it's E() method returning nil), and add the underlying (lower-level) error as some other value (for example Cause, cause it's usually the case, or NonE in listing below).

As an example, given I have the following:

type IntermediateError struct { NonE: error, /* ... */ }
func (e IntermediateError) Error() string { /* ... */ }

func someFn() error {
    // ...
    if err != nil {
        if myCond(errors.E(err)) {
            return &IntermediateError{NonE: err}
        }
        return err
    }
}

Then, somewhere at the much higher level code I can do:

switch v := errors.E(err).(type) {
    case *IntermediateError:
        // Handle high-level error logic, with an option to handle the lower-level error
        // ... logic ...
        // Here I can also access low-level error if needed
        if v.NonE.os.(os.OSError).OsError() == oserror.ErrPermission {
            // logic for permission error that is also an *IntermediateError.
            // only a particular kind of permission error is wrapped as *IntermediateError.
        } else {
            // other case...
        }
    case os.OSError:
        if v.OSError() == oserror.ErrPermission {
             // logic for permission error that is NOT an *IntermediateError
             // (possibly coming from a different, unexpected source).
        }
}

This code snippet is an example of the case I have in mind. With this approach, even though it's much more verbose, it kind of communicates the complexity better that how I'd write it with two lines of code with the current proposal. This feels a lot less magical, I actually like it.

earthboundkid commented 5 years ago

Putting my philosophy degree to work FWIW:

Is has two relevant senses here: identity and predication. If I say, “Bruce Wayne is Batman” that’s an identity statement. They just are two names for the same fictional person. The transitive, symmetric, reflexive properties apply. But if I say, “Batman is a comic book character” that’s predication. The identity properties don’t apply.

It appears that errors.Is means identity but actually it’s just predication. I can see why that’s confusing.

josharian commented 5 years ago

I dropped out of my philosophy program, but FWIW:

It appears that errors.Is means identity but actually it’s just predication.

It’s worse than that—it’s kind of both. The docs:

An error is considered to match a target if it is equal to that target or if it implements a method Is(error) bool such that Is(target) returns true.

creachadair commented 5 years ago

@carlmjohnson

It appears that errors.Is means identity but actually it’s just predication.

@josharian

It’s worse than that—it’s kind of both. The docs:

An error is considered to match a target if it is equal to that target or if it implements a method Is(error) bool such that Is(target) returns true.

In this case that is less a philosophical statement, however, than a straightforward issue of well-foundedness: The recursive definition via the implicit Iser interface needs a base case. 🙂 But it does mean that the definition is tied to that implementation (or one that subsumes it).

rogpeppe commented 5 years ago

@MOZGIII Yup, that's the idea.

Although note that if the code that created *IntermediateError is aware that it's working in the OSError domain, it can implement the OSError interface itself, so the code could be a little simpler (note that in the proposal there's an os.Error function that does the OSError type check for you):

switch v := errors.E(err).(type) {
case *IntermediateError:
    if os.Error(v) == oserror.ErrPermission {
        // logic for permission error that is also an *IntermediateError

You could always switch the logic too, if you only cares about *IntermediateError in the ErrPermission case:

 if os.Error(err) == oserror.ErrPermission {
     if _, ok := errors.E(err).(*IntermediateError); ok {
            // logic for permission error that is also an *IntermediateError.
}
eandre commented 5 years ago

If I'm reading this correctly, the main difference between this and the original proposal, beyond the simplicity aspect, is that this proposal seems to force programs to coordinate a lot more on what I would call an "error hierarchy". That is, if you have more than one type of error that you want to be able to expose, every call site needs to be aware of the "E-value hierarchy".

For example, to use the IntermediateError and oserror.ErrPermission example from above. In the original proposal, any call site can simply errors.Is(err, oserror.ErrPermission) without having to know about the existence of IntermediateError. With this proposal, all call sites must be aware of the hierarchy: IntermediateError is the primary E-value being returned, which may include an OSError inside it. Correctly checking for either, at any point in the program, requires complete knowledge of the full error hierarchy (disregarding errors that only attach metadata).

For small programs I doubt this is a problem. For large programs though, keeping this correct seems really hard. If you have a few different error types to check for it seems hard to guarantee that the same hierarchy is followed through all different call paths of a program. That would lead to programs having to check for the same error type in multiple different ways. In the example above you would have a check for IntermediateError wrapping OSError, but also a check for OSError directly. If you had three types of errors (A, B, and C) you would need to check for A -> B -> C, A -> C, B -> C, and C directly to be entirely sure whether the error is a C.

Lastly it seems to me to introduce the undesirable property that introducing a new type of error that can be checked for will have a large and cascading effect on programs. All callers that check for some other error type, that is now possibly being wrapped in the new type being introduced, will need to change. In large programs I suspect this would be a large amount of code churn for an activity that we presumably want to make easy and even encourage (introducing new error types). Not to mention that even finding all the callers that may somehow receive an error of the new type will be quite difficult in complex programs.

I don't know what the right solution is. I really like the simplicity of this proposal but the above stand out to me as fairly large downsides for large programs. Whether the simplicity is worth that trade-off I cannot say.

flibustenet commented 5 years ago

@eandre for large programs (with long error chain) i would say that's the opposite, adding a new error type in the chain should be more explicit than magical. If not, all error would have been wrapped by default which was rejected by the proposal. But it's really a design choice! I believe simple and explicit fit more here and now (for compatibility with legacy code).

eandre commented 5 years ago

Sorry @flibustenet for not being clear. I'm not saying the act of adding a new error type should be magical. But if a new error type is added that wants the original error to be available (the new error type is "unwrappable" in the words of the original proposal), it seems to me that would lead to a large amount of code changes all over the program in this proposal.

MOZGIII commented 5 years ago

@eandre in my example above, the whole point is for the higher level code to know about both IntermediateError. The goal there is to handle three cases: an IntermediateError that is caused by oserror.ErrPermission, an IntermediateError that's caused by something else, potentially some other kind of error, and the raw oserror.ErrPermission - that may entirely different processing logic than the oserror.ErrPermission wrapped with IntermediateError. Btw, this is why I wouldn't implement OSError for IntermediateError.

With the current proposal, the code will be smaller, but it will be definitely more tricky to find a bug in, if a bug were somewhere in the chain structure. With this proposal it becomes more straightforward, as unwrapping the chain is done manually - similarly to how it was done before the updated error handing proposal.

Also, in my opinion, error types are parts of the public API of a package, and adding new error types, therefore, should not be magical by default anywhere as much as possible. The whole point of explicitly handling error flows (aka lack of exceptions, of the main features of Go) is to force users to think about error conditions in a direct way, without obscuring the details.

eandre commented 5 years ago

@MOZGIII I understood that your example was about that. I gave a different example, where a body of code already exists, and various places check for various types of existing errors. In such a use case, introducing a new error type (IntermediateError in my example) causes a cascade of changes that impacts all existing code that checks for other types of errors.

MOZGIII commented 5 years ago

Lastly it seems to me to introduce the undesirable property that introducing a new type of error that can be checked for will have a large and cascading effect on programs. All callers that check for some other error type, that is now possibly being wrapped in the new type being introduced, will need to change. In large programs I suspect this would be a large amount of code churn for an activity that we presumably want to make easy and even encourage (introducing new error types). Not to mention that even finding all the callers that may somehow receive an error of the new type will be quite difficult in complex programs.

If you add a new meaningful error that wraps the underlying errors for you to be able to alter the control flow based on it (as in my example) - then changing a lot of code to handle it is probably what you're doing it for in the first place. This is equivalent to introducing a completely new, independent error, that you need to handle.

If the error is wrapped in way that only adds additional metadata (like that is was passed through package X), then the wrapper type implements the E() method, and this wrapper does not affect the error handling logic. It does affect printing the error though, and you can make use of additional metadata for debugging purposes.

Additional bonus, is for structured logging purposes, wrapped error can be cleanly split into a part that affects control-flow and metadata, allowing one to implement a logger in a way that unifies representation of the control flow errors and having wrapper errors metadata in a separate, optional fields.

MOZGIII commented 5 years ago

Speaking of metadata though, in the structure logging system I'd probably want to have stack traces for both control-flow affecting and wrapper errors.

eandre commented 5 years ago

I see adding a new error type rather as adding another dimension on the error that you can inspect (at least when the error is "unwrappable"). So any given error may at the same time be a permission error as it is an RPC error. Introducing a new error type SQLError that is "unwrappable" would /add/ a dimension that can be independently queried for. This proposal seems to make that more difficult than the original proposal, so the question to me is whether or not that use case is common.

DmitriyMV commented 5 years ago

With this proposal, all call sites must be aware of the hierarchy: IntermediateError is the primary E-value being returned, which may include an OSError inside it. Correctly checking for either, at any point in the program, requires complete knowledge of the full error hierarchy (disregarding errors that only attach metadata).

As someone who worked on the code that has complex (and sometimes dynamic) error chains, I can't stress this enough. The resulting error handling code in those scenarios is more complex than it has to be. The whole point of the current proposal with Is and As is that I don't need to know how to unwrap the entire chain - I can just verify that expected error is there and make decisions on that.

MOZGIII commented 5 years ago

With this proposal the complexity of error handling is, at least, explicit. I imagine using Is and As would require dedicated testing and custom helper functions nonetheless for complicated cases to ensure the error chain unwrapping is working properly.

It might be that with current proposal, the complexity that is already there in the existing error handing logic doesn't go anywhere (simply because it's the "core" logic to handle the error), but it will be complicated further with the dynamic error coercion. For the large code bases, it then becomes even more difficult to reason about the structure of the error chain, and what error conditions to expect.

And all this is just to write less code. Don't get me wrong - I'm usually all for writing less code, but in this case the underlying complexity of the currently proposed interface is dangerously high for non trivial scenarios. In reality me might end up writing even more code than before to make it reliable.

Maybe with a concrete example we can understand the problematic case for this proposal better.

rogpeppe commented 5 years ago

@eandre Thanks for your thoughtful response. It's really useful to understand how people think about errors and error chaining.

If I'm reading this correctly, the main difference between this and the original proposal, beyond the simplicity aspect, is that this proposal seems to force programs to coordinate a lot more on what I would call an "error hierarchy". That is, if you have more than one type of error that you want to be able to expose, every call site needs to be aware of the "E-value hierarchy".

This is a reasonable characterisation up to a point.

With this proposal, any given program or API can decide for itself how much "error hierarchy" it wishes to support.

If you really need to expose arbitrary error chains to your callers, then the scheme implemented in xerrors would still be possible, I believe. You can document that you're following that convention in your API.

I see adding a new error type rather as adding another dimension on the error that you can inspect (at least when the error is "unwrappable").

Personally, I see this view of errors as problematic. If some error A wraps another error B, that is really not the same as having two independent errors A and B, but is best viewed as one error: "A wraps B".

An example of the kind of bug that can result from this approach: I call a function to get some user info from a database:

userInfo, err := db.GetUser(username)
if err != nil {
    if !errors.Is(err, commonerrors.ErrNotFound) {
       return fmt.Errorf("unexpected error getting user info: %v", err)
    }
    // Special handling for when the user doesn't exist.
}

Unfortunately, the implementation of GetUser reads a configuration file to find some parameters. What happens when the configuration file is not present? The code that reads the config file returns a commonerrors.ErrNotFound error. Then the code above will behave as if the user doesn't exist and do something inappropriate.

This isn't an imaginary scenario: I have seen a real world bug with a very similar root cause. The larger the system, the more likely it is that issues like this can arise.

For large programs though, keeping this correct seems really hard. If you have a few different error types to check for it seems hard to guarantee that the same hierarchy is followed through all different call paths of a program. That would lead to programs having to check for the same error type in multiple different ways. In the example above you would have a check for IntermediateError wrapping OSError, but also a check for OSError directly. If you had three types of errors (A, B, and C) you would need to check for A -> B -> C, A -> C, B -> C, and C directly to be entirely sure whether the error is a C.

I don't think it's as bad as that. Assuming A and B both implement their own intermediate wrapping types (something that's actually relatively rare when you can add contextual text without affecting the inspectible error value), if you're implementing A and B and you know that the underlying operations are os-related, you can implement the os.OSError interface on both types, so you can just do (for example):

 if os.Error(err) == os.ErrNotExist

If A and B don't know that the underlying operation is os-related, then I don't think it's a great idea to expose the underlying directly because of the ambiguity risk above.

I'm concerned that indiscriminate use of chain-based error inspection will make large programs harder to reason about and more fragile. Most of the arguments I've seen for it are based on abstract ideas about why it might be useful, but it's hard to reason about hypotheticals, so I'm particularly interested to hear about concrete cases if you have them.

eandre commented 5 years ago

@rogpeppe thanks, those are all really compelling points, especially the one about interpreting a generic "Not Found" error as something else. I will need to revisit some of my own code that uses such generic errors.

Your comment definitely made me reconsider the whole premise of returning arbitrarily inspectable errors. I think it would be valuable to explore how one might make it more ergonomic to be selective in which types of errors are inspectable. Your proposal does not seem to make it particularly ergonomic, and relies on defining and implementing interfaces like OSError. This makes entire types whitelist other entire classes. I think it would be useful to somehow be able to define more easily, perhaps dynamically at error construction, which classes of errors are inspectable in that situation.

eandre commented 5 years ago

As I was thinking through this some more, to rephrase my second paragraph above, the ideal way of working with errors on an API level is to effectively declare "this API returns an error of kind X in this circumstance, and an error of kind Y in this circumstance", and then have the implementation reflect that declaration by making it clear at all return statements which error kind(s) is being reported at that particular location.

So to me a natural way of reflecting that is through introducing the concept of an "error kind". This seems to match up pretty closely to your E-value idea. But I wonder if it would be easier to understand if the nil semantics would be replaced by returning the receiver to signify that it is its own E-value. The behavior of chaining can still be done through return errors.E(receiver.storedErr).

It would be nice to have some utilities to make certain operations more ergonomic: (1) Adding additional information to an error while changing/specifying its kind. (2) Adding additional context/information to an error without changing (or hiding) its kind. (3) Returning a new error that doesn't expose the "wrapped" error(s)

fmt.Errorf seems to be the canonical way to do (3). It would be nice to get easy and ergonomic ways to do (1) and (2).

And while I'm at it I guess Kind() error is more intuitive to me than just E() error, but that's getting into bikeshedding territory.

rogpeppe commented 5 years ago

As I was thinking through this some more, to rephrase my second paragraph above, the ideal way of working with errors on an API level is to effectively declare "this API returns an error of kind X in this circumstance, and an error of kind Y in this circumstance", and then have the implementation reflect that declaration by making it clear at all return statements which error kind(s) is being reported at that particular location.

Yes! This is exactly what I consider to be the right approach. We should pay more attention to the kinds of errors that might be returned. They are as much a part of an API as any other, and should be documented as such, particularly because they are hidden from the Go type system.

This is the philosophy I had in mind when I designed the errgo/errors package. The idea there is that you express the set of possible returned errors by defining a function func(error)bool that reports whether a given error should be exposed to callers. By default, the E-value (as I would call it now) is hidden. This keeps you honest: if don't return an error of a particular kind, no-one can rely on it.

I'm not saying that everyone needs to follow such a rigorous approach, but it's worked well IMHO in some reasonably significant projects. Sometimes you need to shrug and just use Any, but those occasions are surprisingly rare (and obvious in the code!).

It would be nice to have some utilities to make certain operations more ergonomic:

Yes. I don't believe those operations necessarily need to exist in the errors package itself, but they could do.

(1) Adding additional information to an error while changing/specifying its kind.

The errgo/errors package does this with Because, but if this proposal were accepted, I'd probably rename that (perhaps to WithE or even As :) ).

(2) Adding additional context/information to an error without changing (or hiding) its kind.

That's fmt.Errorf("more info: %w", err). If you want to expose more non-textual metadata though, you'd need to define your own type.

(3) Returning a new error that doesn't expose the "wrapped" error(s)

That's fmt.Errorf("%v", err).

And while I'm at it I guess Kind() error is more intuitive to me than just E() error, but that's getting into bikeshedding territory.

Yup. There's no right word here, which is why I opted for the more semantically neutral E, so that we could try to focus on how this works rather than shades of implied meaning of English words... I have spent far too much time bikeshedding exactly this topic in the past.

MichaelTJones commented 5 years ago

meta observation: this is among the most beautifully-discussed design considerations ever. it is the ability of the Go community to foster such conversations that creates its strength and excellence.

mirtchovski commented 5 years ago

meta observation: this is among the most beautifully-discussed design considerations ever

rog is just that good :D (sorry, couldn't resist digressing)

rsc commented 5 years ago

Thanks @rogpeppe for this well-considered proposal and thanks everyone for the discussion here. Like @MichaelTJones, I am truly impressed by the quality of this discussion.

Roger's main argument as I understand it that there should be a single answer to "what error is this really?", and the concept of the E-value is meant to capture "what the single answer is".

Ultimately I think the decision about whether this is a viable approach reduces to whether it is in fact the case that there is a single answer to "what error is this really?". The Error Values problem overview we wrote last summer gives this example:

In a complex program, the most useful description of an error would include information about all the different operations leading to the error. For example, suppose that the error writing to the database earlier was due to an RPC call. Its implementation called net.Dial of "myserver", which in turn read /etc/resolv.conf, which maybe today was accidentally unreadable. The resulting error’s Error method might return this string (split onto two lines for this document):

write users database: call myserver.Method: \ dial myserver:3333: open /etc/resolv.conf: permission denied

The implementation of this error is five different levels (four wrappings):

  1. A WriteError, which provides "write users database: " and wraps
  2. an RPCError, which provides "call myserver.Method: " and wraps
  3. a net.OpError, which provides "dial myserver:3333: " and wraps
  4. an os.PathError, which provides "open /etc/resolv.conf: " and wraps
  5. syscall.EPERM, which provides "permission denied"

There are many questions you might want to ask programmatically of err, including: (i) is it an RPCError? (ii) is it a net.OpError? (iii) does it satisfy the net.Error interface? (iv) is it an os.PathError? (v) is it a permission error?

The first problem is that it is too hard to ask these kinds of questions. The functions os.IsExist, os.IsNotExist, os.IsPermission, and os.IsTimeout are symptomatic of the problem. They lack generality in two different ways: first, each function tests for only one specific kind of error, and second, each understands only a very limited number of wrapping types. In particular, these functions understand a few wrapping errors, notably os.PathError, but not custom implementations like our hypothetical WriteError.

Roger's proposal seems to take the position that one should not be asking that variety of questions. There should be one answer: either errors.E says this error is a WriteError, or it says it is an RPCError, or it says it is an OpError, or a PathError, or a EPERM. There is no flexibility to be more than one. Even if we just simplify to a PathError containing EPERM, you want to be able to type assert for PathError and also check for EPERM. Those questions can't both be answered directly by one error result: either errors.E(err) == EPERM or errors.E(err).(*os.PathError) succeeds, but not both.

For the specific case of PathError and EPERM, Roger's proposal suggests adding a second kind of unwrapping, the OSError method plus os.Error helper function, that fetches the underlying OS error. In general, then, it seems that there would need to be an unwrap method convention and helper function for every axis of error kind. This is certainly more explicit, and Roger and others have said they appreciate how explicit everything becomes. In contrast, the concept of a chain that you walk and test to answer questions intentionally avoids that complexity. Ultimately there is a tradeoff: do you prefer the complex code and explicitness or do you prefer the simpler code and implicitness?

In general, for the language spec and standard library, we are focused on establishing shared conventions that the ecosystem benefits from agreeing on, usually because it promotes interoperability.

The Is/As/Unwrap proposal suggests agreeing on those three methods, which seem to have enough flexibility to accommodate a wide variety of behaviors that might be appropriate in different packages, including returning errors that can be unwrapped and those that cannot. This E proposal suggests a narrower method set, with the effect of narrowing the behaviors that can be implemented in it and also making it more difficult to evolve errors as packages evolve. For example, you can't start returning a different "stopping point" error from the E method, because existing direct tests of that error will start failing. In contrast, if you needed to update the specific error type with Is/As/Unwrap, you could make the new type answer the Is/As questions (by implementing those methods) to continue to look like the old error too. That kind of evolvability seems important.

Relying on language-level == and .(T) is attractive from considerations of parsimony but it cuts off this evolvability. It's one of the key reasons people aren't happy with the current status quo. And the error type you want for == and for .(T) are usually different, so one function - errors.E(err) - can't gracefully provide the left hand side for both.

I don't want this reply to sound like cutting off the conversation. Again, I have been impressed by the quality of this conversation and I hope it continues. I simply wanted to point out that the generality and comparatively higher complexity of Is/As/Unwrap is very much intentional, and try to explain why.

creachadair commented 5 years ago

I think one key idea behind Roger's proposal has perhaps been understated. That is, that given a tower of error values—e.g., the WriteError→RPCError→net.OpError→os.PathError→syscall.Errno example—there are many questions one could ask about the types of these values, but that it is not clear that their answers are independent of each other.

In other words: When does it make sense to ask "does this chain contain syscall.EINTR?" or "does this chain contain any *os.PathError values?" without also knowing the context in which that error occurred? The chain captures this information, but the API from the Is/As/Unwrap proposal does not make that accessible. In fact, we could make the case that the API actively discourages inspecting the context, since it would require manually unwrapping the whole chain.

The example of the "not found" clash from https://github.com/golang/go/issues/32405#issuecomment-498749730 illustrates this: If I am a particular program, it is likely not safe for me to assume that any *os.PathError is the one I cared about, unless I also know that the errors between me and it transit only one path-related lookup.

Roger's proposal does not solve that problem, but does I think better separate the concern of metadata wrapping from that of error-type judgement, which the Is/As/Unwrap proposal folds together. Is/As/Unwrap is very general, but in my (small) trials of xerrors so far I have found it complicates the common trivial cases—hot path cases where I really don't want to unwrap a whole chain just to discover the file did not exist.

Whether or not this is the right solution, I think it's important not to lose that kernel of insight, that error chains are not broadly context-free.

MOZGIII commented 5 years ago

Is/As/Unwrap methods implement a pattern that can be applied to other values, not just error types. When I thought about it, I asked myself: do I want to have that pattern available for any types in the language? Probably, in general, yes, especially Is/As part, but in a very limited and well-thought way. In some cases it would be meaningful and useful, and it would be great to have something language-wide for that.

However, I currently don’t see a way to prevent people from abusing this pattern. I’m afraid it becomes a wide-spread alternative to interfaces - it has the same property that it allows generalization, but I think we already have enough of that in the language (with interfaces).

Perhaps we should consider Is/As a general, not-just-errors thing?

Going back to just errors - the same logic applies. Errors are just values after all, and even though those values are often used in a particular context - there’s a potential a way to abuse the feature.

That said, I really like helper functions like IsNotFoundError(err error) bool, or AsNotFoundError(err error) *ExportedErrType - cause with them you can actually break the coupling between the value you return in the API, and the semantics of that value. Yes, I value explicitness, and with this approach you provide an explicit way to assert and handle the error.

I feel like this proposal is exactly what I need - as it solves one of the major pain points of wrapping the errors with custom metadata - which typically complicates the actual control flow.

MOZGIII commented 5 years ago

Btw, what does As do if there are multiple *PathErrors in the chain?

creachadair commented 5 years ago

Btw, what does As do if there are multiple *PathErrors in the chain?

Per https://godoc.org/golang.org/x/xerrors#As:

// As finds the first error in err's chain that matches the type to which target points […]
MOZGIII commented 5 years ago

Finding the first error in the chain might not work for every case. If we are to end up with As in the language - we probably need some support for building a more customized chain inspection logic.

For example, if there's a chain of errors *TxRollBackError -> *PathError -> *MainLogicError -> *PathError, and I want to find a *PathError that's below *MainLogicError, and not the one that's below *TxRollBackError. This example might be a bit artificial, but the presence of the same error type in chain, unless it's actually hidden from the chain unwinding mechanism, may be difficult to deal with. And if we wrap the *PathError that caused *TxRollBackError with something that doesn't show up in As properly - then the whole point of the unified unwrapping mechanism is kind of lost.

rogpeppe commented 5 years ago

@rsc Thanks Russ for your thoughtful reply.

Roger's main argument as I understand it that there should be a single answer to "what error is this really?", and the concept of the E-value is meant to capture "what the single answer is".

Yes, that is my main argument but I don't think that the "single answer" restriction is as restrictive as you're making it out to be. I believe Go's interface type assertions provide a neat way of providing evolvability without needing to have universal chain-based error inspection.

Ultimately I think the decision about whether this is a viable approach reduces to whether it is in fact the case that there is a single answer to "what error is this really?". The Error Values problem overview we wrote last summer gives this example:

In a complex program, the most useful description of an error would include information about all the different operations leading to the error. For example, suppose that the error writing to the database earlier was due to an RPC call. Its implementation called net.Dial of "myserver", which in turn read /etc/resolv.conf, which maybe today was accidentally unreadable. The resulting error’s Error method might return this string (split onto two lines for this document):

write users database: call myserver.Method: \
    dial myserver:3333: open /etc/resolv.conf: permission denied

The implementation of this error is five different levels (four wrappings):

  1. A WriteError, which provides "write users database: " and wraps
  2. an RPCError, which provides "call myserver.Method: " and wraps
  3. a net.OpError, which provides "dial myserver:3333: " and wraps
  4. an os.PathError, which provides "open /etc/resolv.conf: " and wraps
  5. syscall.EPERM, which provides "permission denied"

There are many questions you might want to ask programmatically of err, including: (i) is it an RPCError? (ii) is it a net.OpError? (iii) does it satisfy the net.Error interface? (iv) is it an os.PathError? (v) is it a permission error?

The first problem is that it is too hard to ask these kinds of questions. The functions os.IsExist, os.IsNotExist, os.IsPermission, and os.IsTimeout are symptomatic of the problem. They lack generality in two different ways: first, each function tests for only one specific kind of error, and second, each understands only a very limited number of wrapping types. In particular, these functions understand a few wrapping errors, notably os.PathError, but not custom implementations like our hypothetical WriteError.

I think that this is a great example actually. Let's say I write some code that's calling the database-writing function and checking the returned error:

if err := db.WriteSomething(data); err != nil {
    if errors.Is(err, os.ErrPermission) {
        doSomething()
    }
}

What do I mean when I write this code? Am I checking whether I don't have permissions to write to the database? Am I checking whether I don't have permissions to make any RPC call? Am I checking whether I don't have permissions to make a network call? Am I checking whether I don't have permissions to read the /etc/resolv.conf file?

The answer is potentially: all of these things! Every level shown there could potentially wrap an error for which errors.Is(err, os.ErrPermission) returns true.

Now let's say that I'm really just interested in checking whether the client doesn't have permission to write to the database. Do we really think it's OK for my code to call doSomething when the actual error is that the local resolv.conf file cannot be opened? I'd suggest not - that seems to me like an entirely different kind of error that would require a very different kind of response. But the current proposal would conflate the two kinds of error. ISTM that this is a fundamental property of this approach to error checking and it seems fragile to me.

This is made worse by the fact that errors are inherently sporadic. Let's say I'm reviewing the above code. I observe that there are appropriate unit tests and that we've got 100% code coverage. I approve the review. Then in 6 months' time, someone has this issue with permissions on their resolv.conf file. It results in strange behaviour because this code path is triggered.

Maybe the other code path doesn't even exist at the time of review. Later on, someone added new code in some dependency and broke production in some unlikely (but potentially important) edge case.

Roger's proposal seems to take the position that one should not be asking that variety of questions. There should be one answer: either errors.E says this error is a WriteError, or it says it is an RPCError, or it says it is an OpError, or a PathError, or a EPERM. There is no flexibility to be more than one. Even if we just simplify to a PathError containing EPERM, you want to be able to type assert for PathError and also check for EPERM. Those questions can't both be answered directly by one error result: either errors.E(err) == EPERM or errors.E(err).(*os.PathError) succeeds, but not both.

This is true when you're talking about concrete types and values, but ISTM that this is where Go's interface types can work nicely. In this case, both questions can indeed by answered directly by one error result: errors.E(err).(os.OSError).

For the specific case of PathError and EPERM, Roger's proposal suggests adding a second kind of unwrapping, the OSError method plus os.Error helper function, that fetches the underlying OS error. In general, then, it seems that there would need to be an unwrap method convention and helper function for every axis of error kind. This is certainly more explicit, and Roger and others have said they appreciate how explicit everything becomes. In contrast, the concept of a chain that you walk and test to answer questions intentionally avoids that complexity. Ultimately there is a tradeoff: do you prefer the complex code and explicitness or do you prefer the simpler code and implicitness?

You seem to be asserting that being explicit results in more complex code. In fact, I found that when I changed the standard library to be explicit instead of using Is, the code became simpler and shorter (45 lines less production code, not counting the removal of the Is/As functions themselves).

ISTM that the worry with the explicitness here is not that the code becomes more complex (it doesn't, it seems), but that the error checking code is less general and that the error-returning code will not be able to evolve to fit future requirements.

For example, you can't start returning a different "stopping point" error from the E method, because existing direct tests of that error will start failing. In contrast, if you needed to update the specific error type with Is/As/Unwrap, you could make the new type answer the Is/As questions (by implementing those methods) to continue to look like the old error too. That kind of evolvability seems important.

I agree that with my proposal, when you use ==, or .() with a concrete type, that the code you're calling is not free to add intermediate error-wrapping types without breaking backward compatibility. Use of concrete types in error checking could perhaps be considered analagous to using a concrete non-struct type in any package API - it locks you down, but often that's just fine. Most code does not have a need to add additional inspectible error data when it wraps errors.

I believe there is a trade-off here: I don't think we can provide the freedom to insert arbitrary intermediate types in the chain without also running into the kind of ambiguity issues that I've tried to illustrate earlier in this post.

Relying on language-level == and .(T) is attractive from considerations of parsimony but it cuts off this evolvability. It's one of the key reasons people aren't happy with the current status quo. And the error type you want for == and for .(T) are usually different, so one function - errors.E(err) - can't gracefully provide the left hand side for both.

With some care, it is possible to expose errors which are evolvable. If we assert errors for behaviour, not type, then when we add an intermediate type, we can make it implement the expected interface.

For example, as I mentioned earlier, err.(os.OSError) allows checking both syscall.EPERM and for an *os.PathError that wrapps os.ErrPermission. This seems graceful enough to me. Yes, the types in question need to have explicitly opted in to this particular form of wrapping, but I think that's actually a Good Thing.

I'm interested to hear about real-world cases of this class of evolvability problem. I think that just as with any other part of an API, it will usually be possible to evolve in a backwardly compatible way, by using different entry point names, providing opt-in flags, etc.

I don't want this reply to sound like cutting off the conversation. Again, I have been impressed by the quality of this conversation and I hope it continues. I simply wanted to point out that the generality and comparatively higher complexity of Is/As/Unwrap is very much intentional, and try to explain why.

Thanks again for your thoughts. They have made me more aware that there is a trade-off to made here, even though on balance, I still think that it would be better to go with the simpler approach I've outlined in this proposal. Note that if we go for the E-value approach, it is still compatible with the Unwrap and friends. I don't think the converse is true though, which is why I think it would be worth at least pausing for a little longer consideration rather than publishing the errors package as-is in Go 1.13.

eandre commented 5 years ago

I think the problem of "what does this permission error mean" is more a result of (1) careless usage of unwrappable errors, and (2) over-use of ambiguous or generic errors.

In the database example, we should not repurpose errors that signify OS-level permission denied to mean application level "user doesn't have access to perform that database operation". Similarly, a database should not carelessly expose unwrappable errors.

It seems to me that these guidelines must be followed for both proposals equally.

rsc commented 5 years ago

@rogpeppe, thanks for the reply. Two small points.

- In the 5-level example, I agree that there is potential for layering confusion in the example from the errors problem overview. I think that one is a bit extreme, and in some ways I regret it, but it's the one I had at hand. In general I think we need to establish a kind of best practice that errors returned by a particular package should probably not expose internal problems to Is/As/Unwrap. In a real system I would probably draw an "error opacity" line between 1+2 and 3+4+5, so that the top-level error would not be a "permission denied" in that sense. The smaller examples, things like a plain PathError wrapping EPERM, are sufficient in my mind to exclude any kind of "function returning an error to use in == or .(T)", including errors.E. More on that in a separate followup comment.

- I've been involved in a handful of efforts over the years to try to define 'common error spaces' in different languages for programmatic inspection. In each case, I felt like we did the best we could in that context at that time, but none of them has ever reached anything I was completely happy with, including this one. I think Is/As/Unwrap is an important, needed improvement over the current status quo in Go, and I think it's the best we can do in this context at this time, but it's certainly not perfect. I don't know what perfect is here, and I freely admit that.

1/3

rsc commented 5 years ago

@rogpeppe's initial post says:

For example, it seems to me that this code:

// OSError implements the OSError interface by returning
// the OS error for e.
func (e Errno) OSError() error {
  switch e {
  case EACCES, EPERM:
      return oserror.ErrPermission
  case EEXIST, ENOTEMPTY:
      return oserror.ErrExist
  case ENOENT:
      return oserror.ErrNotExist
  }
  return nil
}

is easier to understand than its current equivalent:

func (e Errno) Is(target error) bool {
  switch target {
  case oserror.ErrTemporary:
      return e.Temporary()
  case oserror.ErrTimeout:
      return e.Timeout()
  case oserror.ErrPermission:
      return e == EACCES || e == EPERM
  case oserror.ErrExist:
      return e == EEXIST || e == ENOTEMPTY
  case oserror.ErrNotExist:
      return e == ENOENT
  }
  return false
}

I agree about the scannability, but we could easily rewrite the current code to look more like Roger's:

func (e Errno) Is(target error) bool {
    switch e {
    case EINTR, EMFILE:
        return target == oserror.ErrTemporary
    case EAGAIN, EWOULDBLOCK, ETIMEDOUT: // and EBUSY on Plan9 
        return target == oserror.ErrTimeout || target == oserror.ErrTemporary
    case EACCES, EPERM:
        return target == oserror.ErrPermission
    case EEXIST, ENOTEMPTY:
        return target == oserror.ErrExist
    case ENOENT:
        return target == oserror.ErrNotExist
    }
    return false
}

What this highlights is that the main difference is not that one approach inherently produces nicer implementation methods but that ErrTemporary in particular is strange, and that it is probably a mistake to try to use Is to check for error properties as opposed to error kinds, or at least that we should be more careful about distinguishing the two.

I think maybe we should drop ErrTemporary from the Go 1.13 changes. I was going to suggest ErrTimeout as well, but mapping EAGAIN/EWOULDBLOCK/ETIMEDOUT/EBUSY to ErrTimeout all seem pretty direct. That is, timeout seems like a specific kind of error, something you could reasonably return from a function as a description of what went wrong; in contrast, temporary is only an adjective for certain errors that does not stand by itself.

If we did remove ErrTemporary, then the current code would simplify further to:

func (e Errno) Is(target error) bool {
    switch e {
    case EAGAIN, EWOULDBLOCK, ETIMEDOUT: // and EBUSY on Plan9 
        return target == oserror.ErrTimeout
    case EACCES, EPERM:
        return target == oserror.ErrPermission
    case EEXIST, ENOTEMPTY:
        return target == oserror.ErrExist
    case ENOENT:
        return target == oserror.ErrNotExist
    }
    return false
}

which I think is pretty much at the same level of simplicity as Roger's initial version.

I've filed #32463 to discuss removing ErrTemporary.

2/3

rsc commented 5 years ago

As I mentioned in 1/3 above, I think PathError containing EPERM is instructive to consider as far as whether the Is/As/Unwrap generality/complexity is truly needed or we could use the less general, less complex errors.E instead.

Package io is meant to define its own concepts, independent of os. An operating system is one way to do I/O, but there is lots of alternate I/O in Go as well (for example, all the various functions that return an io.Reader or io.Writer not backed by an *os.File). In particular, package io defines io.EOF (and io.ErrUnexpectedEOF).

Package os defines

type PathError struct {
    Op string
    Path string
    Err error
}

The methods on *os.File return *os.PathErrors, for example &PathError{"read", "/etc/passwd", ErrPermission}. This fits the general rule of functions and methods including relevant known context in their errors, so that callers do not need to add context the callee already knew. Code can write:

if _, err := f.Write(buf); err != nil {
    log.Fatal(err)
}

and be confident that the error will include the file name.

The exception to this general rule is io.EOF, because wrapping it in a PathError would made it much more difficult to write what today is:

n, err := f.Read(buf)
if err == io.EOF {
    return
}
if err != nil {
    log.Fatal(err)
}

If we'd had errors.Is at the start, we could have let the *os.File methods return a *os.PathError even for EOF, and err == io.EOF would become errors.Is(err, io.EOF).

What about with errors.E? Because some code may want to get at the actual *os.PathError and inspect the path, errors.E must return the *os.PathError, not io.EOF. This (errors.E) proposal suggests adding a separate OSError method and os.Error helper to get at the underlying error, so that err == io.EOF would instead become os.Error(err) == io.EOF.

That seems like a layering violation: io.EOF is an I/O error, not an OS error. What if the code is, say, io.Copy and is written to take an io.Reader. It makes no sense to be calling os.Error - there's nothing from package os in sight.

The obvious solution is to duplicate @rogpeppe's suggested changes for package os into package io, defining:

package io

type IOError interface {
    IOError() error
}

func Error(err error) error {
    if e, ok := errors.E(err).(IOError); ok {
        return e.IOError()
    }
    return nil
}

Then err == io.EOF becomes io.Error(err) == io.EOF. That is pretty clear, and I admit I quite like that line of code in isolation. To make that work, however, the definition of io.EOF has to change from:

var EOF = errors.New("EOF")

to the more complex:

type eofError struct{}
func (e eofError) IOError() error { return e }
func (e eofError) Error() string { return "EOF" }
var EOF = eofError{}

And then os.PathError has to implement both OSError and IOError:

package os

type PathError struct {
    Op string
    Path string
    Err error
}

func (e *PathError) OSError(err error) error { return /*os.*/Error(e.Err) }
func (e *PathError) IOError(err error) error { return io.Error(e.Err) }

And sometimes the underlying error is a syscall.Errno (a type implementing error), which can be important. Today you can write errors.Is(err, syscall.EMFILE). To support that, we'd have to add something like:

package syscall

type SyscallError interface { SyscallError() error }
func Error(err error) error { ... }

and then

func (e *PathError) SyscallError(err error) { return syscall.Error(e.Err) }

and then people could write syscall.Error(err) == syscall.ENOSPC instead of errors.Is(err, syscall.ENOSPC). Moving over to package net, net.OpError is going to need at least these methods, plus probably NetError.

And so on. The result is a very explicit, very clear statement of relationships between errors. You could even imagine making each of the specific error interfaces returning themselves, as in IOError() IOError instead of IOError() error, to catch even more in the type sytem. There's a lot to recommend this approach.

But if this approach has a fundamental flaw, it's the impossibility of generality. Consider a helper to unpack a zip file to disk:

package ziputil

func Unzip(dstdir, zipfile string) error {
    z, err := Open(zipfile)
    ...
    for _, f := range z.File {
        if err := unzipTo(filepath.Join(dstdir, f.Name), f); err != nil {
            return &UnzipFileError{ZipFile: zipfile, Name: f.Name, Err: err}
        }
    }
    ...
}

type UnzipFileError struct {
    ZipFile string // path to zip file
    Name string // name of entry in file
    Err error // error writing file
}

func (e *UnzipFileError) Error() string {
    return fmt.Sprintf("unzip %s: %s: %v", e.ZipFile, e.Name, e.Err)
}

The UnzipFileError allows the caller to programatically determine which file went wrong, in case that is necessary, but it also exposes the underlying error. The caller may also want to check for specific errors, like os.ErrPermission or syscall.ENOSPC. With Is/As/Unwrap, the author of ziputil only has to add:

func (e *UnzipFileError) Unwrap() error { return e.Err }

and then whatever worked for errors.Is and errors.As on the original error will work on the wrapped error. In particular, errors.Is(err, os.ErrPermission) and errors.Is(err, syscall.ENOSPC) both work. In contrast, to make both os.Error(err) == os.ErrPermission and syscall.Error(err) == syscall.ENOSPC work, you'd have to explicitly enable them:

func (e *UnzipFileError) OSError(err error) { return os.Error(e.Err) }
func (e *UnzipFileError) SyscallError(err error) { return syscall.Error(e.Err) }

This is more work but still is maybe OK and defensible, since presumably that code was calling os.Create, so it "knows" about os. (It's unfortunate to need to import syscall just to add SyscallError.)

But now suppose we want to generalize our zip helper to arbitrary file storage implementations:

package ziputil

type FS interface {
    Create(file string) (io.WriteCloser, error)
}

func Unzip(dst FS, zipfile *zip.Reader) error { ... }

Now there is not even a direct use of package os. It's hard for me to accept UnzipFileError being required to implement OSError and SyscallError when it is written in general terms without reference to either of those packages. And what if I have a remote file system implementation of FS that returns a RemoteFSError? If I want to check for a specific one of those errors, it's definitely not OK to expect ziputil to import that remote file system implementation to implement the RemoteFSError method. As a caller I guess I just have to know about UnzipFileError and test it myself:

err := ziputil.Unzip(myRemoteFS, zipfile)
if err, ok := err.(*ziputil.UnzipFileError) && remotefs.Error(err.Error) == remotefs.ErrProblem {
    ...
}

That is, in this case, the errors.E approach leaves the developer with no choice but to continue checking for and reaching inside specific wrapper types. Admittedly, this is true only for the inner error that the callee did not anticipate, but knowing that is problematic on its own: remotefs.Error(err) == remotefs.ErrProblem doesn't work in this case, because ziputil did not set up for remotefs, but maybe os.Error(err) == os.ErrPermission does work. How does a caller keep track of that? (There's no compile-time signal when you get it wrong.)

In contrast, with Is/As/Unwrap, there is a simple rule. Provided ziputil implements the single, trivial Unwrap method, then errors.Is works as well for remotefs.ErrProblem as it does for os.ErrPermission:

err := ziputil.Unzip(myRemoteFS, zipfile)
if errors.Is(err, remotefs.ErrProblem) {
    ...
}

I don't believe this is an isolated or unlikely example. I think this kind of thing will come up over and over again, in basically any code that tries to parameterize over an interface with a method that returns an error. In that setting, the caller knows more about what kinds of errors might come back than the callee does. (And even if the callee does know the full set, having to add all those methods may trigger imports that would otherwise be unnecessary.)

These examples are all about errors.Is, so they could be handled in part by redefining errors.E to return the canonical "Is" value (only "in part" because of legacy concerns like syscall.EPERM vs os.ErrPermission being different errors but also partly the same), or in full by defining that errors should implement an Is method that takes care of chaning, instead of Unwrap. That breaks down for type assertions, as I noted yesterday, so then you need to define that an error wrapper implements both chaining Is and chaining As, or you define (as we have) that an error wrapper defines the single chaining Unwrap used by both errors.Is and errors.As.

3/3

networkimprov commented 5 years ago

If we'd had errors.Is at the start, we could have let the os.File methods return a os.PathError even for EOF

This! When doing io.Copy(network_writer, file_reader) I want to retry or abort on filesystem errors, and return on network errors so as to close the connection in the caller.

It would help a lot if all stdlib errors could be traced to their source API. The generic sentinel errors seem mysterious.

earthboundkid commented 5 years ago

Hmm, with Is, io.Copy could start wrapping its errors in io.ErrReader or io.ErrWriter while preserving the underlying error. Not Go 1 compatible, but you could do it as a module version or something.

rsc commented 5 years ago

\@networkimprov, a bit off topic, but if you have io.Copy(w, r), w is a conn created by package net, and r is an *os.File, then you should be able to distinguish any error today by checking for *net.OpError vs *os.PathError already. The only non-wrapped possibility for w.Write or r.Read should be io.EOF from the read side, and io.Copy disposes of that. And if I'm wrong about that, then of course you could also write io.Writer and io.Reader wrappers that annotate the errors from the underlying calls however you want. No changes to the error model needed.

networkimprov commented 5 years ago

Thanks @rsc. Hm, I misspoke above. What had puzzled me was the variety of error types from net.Conn.Read() for a TLS connection. I have this...

      aLen, err = conn.Read(...)
      if err != nil {
         if err == io.EOF {
            // wrap up
         } else if _, ok := err.(tls.RecordHeaderError); ok {
            // log & fail
         } else if aErr, _ := err.(net.Error); aErr != nil {
            // check aErr.Temporary() & .Timeout()
         } else {
            // log & fail
         }

I wish they were all net.Error or net.OpError, especially in context of io.CopyN(file_writer, net_reader, count)

ConradIrwin commented 5 years ago

I agree with the observation that the new errors API seems under-tested, and I also share @rsc’s experience of trying several times to come up with a universal error API for introspection and struggling to make it perfect for all use cases.

Almost all the conversation so far has dealt with error handling, but I think error reporting is also important. I slightly prefer errors.Unwrap() over errors.E() because it allows easily printing out the chain of errors, which will make human debugging easier.

I assert this because when debugging an error you want to know, what was the program doing and what went wrong. Both proposals let you log what went wrong, but only by looking at the intermediate errors can you tell what the program was trying to do (that said, if go errors had stacktraces I think those would be better still, at significantly less code complexity to library authors).

When handling a concrete error I think the existing manual type casting is better than both errors.E() and errors.Is/As. In our codebase we have a bunch of helper functions like IsUniqueConstraintViolation and IsGmailRateLimit. I think we’d still have these even if there was a common Unwrap/E because the conditions we’re checking are not the type of the error, but specific fields.

The main use-case for Is/As that I see is so that libraries can start (or stop) wrapping errors without breaking calling code. I don’t know that this is a flexibility we should push for, as changing the concrete return type will still break any code that is doing error handling with explicit type casts.

In short, from my experience, it’s useful to have a common way to chain errors to assist with manual debugging and logging, and building further error tooling. I don’t see a need for a common way to pull specific errors out of the chain (explicit destructuring seems more robust).

My concrete proposal would be to keep errors.Unwrap() and Wrapper Interface but to not push errors.Is/As into the standard library - they (and more) can be implemented once you have a common interface for traversing the chain.

rogpeppe commented 5 years ago

[aside: I'm still working on my response, @rsc :) ]

Almost all the conversation so far has dealt with error handling, but I think error reporting is also important. I slightly prefer errors.Unwrap() over errors.E() because it allows easily printing out the chain of errors, which will make human debugging easier.

It's important to mention that neither E nor Unwrap are about printing out the chain of errors. Formatting of error message information is an orthogonal proposal.