Open ianlancetaylor opened 11 months ago
The general idea of this resonates with me, but it feels strange to me that this would be a runtime-changeable setting rather than a compile-time-changeable setting. In my application it would mean that setting this environment variable would expose the detail of which end-user-facing error messages are being derived from error.Error()
results and which are hand-written specifically for our application.
I tend to bristle a bit each time the Go runtime gains another way to globally change some runtime/library behavior out from under my application, since that adds yet another dimension of variation to consider when someone reports some unexpected behavior, but I suppose I am more sensitive to that than most Go application developers given that I use Go to develop CLI tools for which it should be irrelevant to the user what language they are written in. Probably it's more common to use Go to write server processes that are designed to run only in a particular execution environment and where everybody knows well that the program is written in Go.
This is honestly a more broad piece of feedback than just on this proposal, and so I should probably record it somewhere else :thinking: but I would enjoy some way to compile an application such that it does not respond to GODEBUG
feature-switching at all, but have a way to explicitly turn on some of these features programmatically based on mechanisms that make more sense for my application, such as the application's configuration file. Then I would expose only the ones I know about and can see a reason for a user to turn on, and exclude any that have questionable value to my application.
Since I've already contributed a large number of words to this issue across two comments I'm going to extend this comment to include a response to Ian's later comment about it being an advantage to extend the error string, even though that makes this response appear out of order, just because it's an extension of my existing point above.
I consider it to be a disadvantage that it would affect the behavior of existing programs that have not been modified to request or expect stack traces, since some error messages are written for users and turning this on would significantly change those error messages.
Even though it's opt-in for a user, there are some applications (such as my own) where the user is not the person that ought to be making this decision, or if it is the user making the decision then it would be better to make it in a way that is more natural for the application.
Note that the real reason I want this is to report errors to some error reporting services like sentry. Thus the stacktrace must be retrievable via some programatic way, either a public function in errors
, or via unwraping to an exported type, i.e. not string parsing.
This type of functionality is common for long running services were we want to report exceptions / unexpected errors. As a single data point, here's how common it is in our monorepo...
$ ag -i '(fmt.Errorf|errors.New|errors.Wrap)' pkg | wc -l
10573
Once we encounter an unexpected error, we have both grpc/http middlewares auto-magically extracting the stack trace and submitting it to sentry. As a success criteria of this change, the previous line count should be drastically reduced. In other words, most callsites should be changed from
if err := myFn(...); err != nil {
return errors.Wrap("myFun: %w", err)
}
To simply
if err := myFn(...); err != nil {
return err
}
The above comment from @fsaintjacques makes me consider a different design, which also happens to be a solution to my earlier feedback. :smile:
Building on the design approach for errors.Is
and errors.As
with special optional methods that error
implementations could implement, add errors.StackTrace(err error) *Stacktrace
which:
err
has a method ErrorStackTrace() *Stacktrace
. If not, immediately returns nil.return err.ErrorStackTrace()
The default situation is that no errors have backtraces.
When GODEBUG=errstacktrace
is enabled, the errors.New
and fmt.Errorf
functions include a stack trace in the generated error value. The types returned by both implement the ErrorStackTrace
method as returning the saved stack trace, if present. Whether they do that by always returning a type containing a *Stacktrace
field that will often be nil
or by returning a different type depending on whether the stack trace functionality is enabled is an implementation detail up to the implementer, as far as I can tell.
The Error()
function for each of these types does not change at all: it just returns the specified message, as before.
If nothing in the program ever passes an error to errors.StackTrace
then the stack trace collection is just some wasted time collecting a stack trace nobody will refer to. But a program that does want to capture a stack trace, for example to send it to Sentry as described above, can try to pass the error to errors.StackTrace
and make use of the result whenever it's non-nil.
In the above I've assumed there's a material cost to capturing a stack trace and thus it would be too expensive to enable by default, but I've not actually measured that. If capturing a stack trace is relatively cheap then it would be nicer to not require the extra environment variable at runtime, but I don't know exactly how the Go runtime tracks stack traces internally.
As an author of the gitlab.com/tozd/go/errors stack trace attaching errors package, I think having an opt-in stack trace recording of standard errors would be great. But the current proposal has some issues:
I do not think it is reasonable or even workable to attach stack trace to Error string:
fmt.Errorf("%w: extra", err)
behaves, when err
already contains a stack trace? It attaches extra
to the end of the stack trace while without GODEBUG
it does it to the end of the Error line.I think instead:
StackTrace() []uintptr
interface which return nil
without GODEBUG
flag and what callers()
provide when GODEBUG
is set.
GODEBUG
is set.[]uintptr
to not have another type defined.%+v
, which is kinda standard for this these days.How exactly is recording of stack traces enabled, using GODEBUG
or some other mechanism I will leave to others to discuss. But I feel strongly that my proposal above nicely combines with existing approaches while not breaking anything.
The advantage of adding the stack trace to the Error
string is that it works in a program that has not already been designed to work with error stack traces. If you have to change the code to print or otherwise handle the stack trace, then it seems to me that you may as well use an existing package that manages stack traces for you.
That said, I agree that it makes sense to add a way to retrieve just the stack trace from the error, if there is one.
I also agree that we should arrange that using %v
or %w
in a fmt.Errorf
doesn't blindly copy the stack trace string. We should fetch the error string without the stack trace, format it, and then add in the stack trace. Right now I think it should just preserve the original stack trace, and not add a new one. Perhaps some more clever option is available.
I also agree with @mitar that the ideal signature is []uintptr
, that's what we do internally.
In the above I've assumed there's a material cost to capturing a stack trace and thus it would be too expensive to enable by default, but I've not actually measured that. If capturing a stack trace is relatively cheap then it would be nicer to not require the extra environment variable at runtime, but I don't know exactly how the Go runtime tracks stack traces internally.
I would expect that this cost is paid in (hopefully) exceptional case, very rarely will you see an error frequently returned in a hot loop.
@ianlancetaylor OK, what about then: we do not change Error
string at all, we only unconditionally add stack trace when the error is formatted when the flag is enabled (with whichever verb)? That might be the right trade-off because otherwise I can expect some tests might be failing if they call err.Error()
and expect it to be in some structure (yes, bad programming, you should use errors.Is
, etc., but still). But if you are doing fmt.Printf("error: %s", err)
it is probably safer to include a stack trace?
This proposal is to add stack traces to errors.
The Go 1.13 errors proposal was to capture a single stack frame, and allow error wrapping to identify the trace of locations producing the error. This approach reduces the size of error values and can represent a trace that passes across goroutine boundaries, but produces less information in the simplest case.
I think any proposal to add stack information to errors needs to consider full stacks vs. frames, and present an argument for which approach is used.
@mitar Let me restate your suggestion. We keep the Error
method as is, but change the fmt package (other than fmt.Errorf
) so that in any case where we call the Error
method, we also check for and call a Stacktrace
method, and, if that method is not empty we call it and also print the stack trace. We change fmt.Errorf
so that if it wraps an error it does something intelligent with any stack trace. Does that sound right?
@neild This proposal suggests an optional feature that we expect to only be enabled while debugging. As such, I am thinking it should capture a complete stack trace.
@ianlancetaylor: Sounds good. And for the "so that if it wraps an error it does something intelligent with any stack trace" bit, from my experience: if it wraps more than one error (or no errors), you should record a new stack trace, if it is just one error, you should inherit the stack trace. But good luck reasonably deal with stack traces when joining more than one error. :-) (In my package, I had to resort to indentation when formatting such errors to visually show which stack traces belong where, and there are more than one stack trace involved then.)
This proposal seems to have a gap where sentinel errors come into the picture. That is, things like:
var ErrFoo = errors.New("blah blah blah")
func Bar() error {
return ErrFoo
}
For one, that call to errors.New
probably cannot attach a stack trace, at least not in any meaningful sense. But also, I kind of want the return
to implicitly attach a stack trace, which is a thorny matter in legacy cases where people are using ==
instead of errors.Is
.
It seems like under this proposal, I'd have to instead do:
var ErrFoo = errors.New("blah blah blah")
func Bar() error {
return fmt.Errorf("%w", ErrFoo)
}
which seems kind of awkward. (Yes I probably want to do that anyway to force the use of errors.Is
but still.)
@rittneje I had this specific gotcha as something that would be required, but I fail (I didn't really put the effort) to see how it could be easily avoided. And the be clear our internal library does something like this:
pcs := ...
if strings.HasSuffix(runtime.FuncForPC(pcs[0]).Name(), ".init") {
// Don't wrap when using sentinel errors defined in global vars
return err
}
return wrapWithStackTrace(err, pc)
which means that any sentinel defined errors don't have frames attached.
Couldn't error stack traces be somehow implemented by the compiler itself? I would imagine having a build tag of some sort (or GOEXPERIMENT?) that makes the error interface to be more 'heavyweight' i.e. it will have more fields internally used for stack traces. Then the stack trace will be available through some built-in or a function in errors package.
And then each time a function returns a new error (detected as a conversion between concrete type to an error interface, or returning a global variable), the compiler would populate the internal error fields with the stack trace.
It would be nice if this would also somehow detect wrapping errors. Not sure how.
Don't know whether this is possible in go, but zig has Error return traces, maybe Go can grab some ideas from zig?
Errors are rarely errors, mostly values, i mean nothing exceptional. It would be overkill to add a trace to all errors, like io.EOF, sql.ErrNoRows, custom errors/values... It's why I like the idea of %w
where we decide at each step in the code itself if we need to keep something or not and not globally.
hi
To sum up what I understood the objectives are from the conversations:
hopefully I got it more or less right :sweat_smile:
My opinions:
Possible solutions:
As I experimented here https://github.com/golang-exp/errors-poc, configuration can happen at runtime with an ErrorLevel
akin to a LogLevel
using methods similar to log/slog
.
var String = func(err error) string
and this can be picked up by fmt, logger,... when handling an error (??)log/slog
would allow a simple line or two in the main function to have the same outcome of the GODEBUG=errorstacktrace=1
environment variable but with this configuration being dynamic (see here)`var ErrFoo = errors.New("blah blah blah")
func Bar() error { return fmt.NewErrorFormatter(LEVEL_NO_STACK).Errorf("%w", ErrFoo) }`
Opaque
I think I have seen the Opaque function being implemented in the future errors
(or maybe I have seen it somewhere else); that's nice.
Pop
It would be nice to have a Pop
function extracting the last error struct{} whatever that may be(string, programCounter,..)
ErrorLevelVar with offsets:
moreover different errorer
or errorformatter
instances could be initialized with an offset, this is to allow a single ErrorLevelVar
to configure the defaults as well as other instances.
example:
errorer
with default error level (i.e LEVEL_NO_STACK = 0),thus behaving as current errorserrorer
instance here has an offset +1 from the default, thus it's generating stacktraces when `ErrorLevelVar is 0 or moreerrorer
instance here has an offset +2 from the default, thus it's generating stacktraces when ErrorLevelVar
is -1 or morePS:
This proposal seems to have a gap where sentinel errors come into the picture.
Yea. In gitlab.com/tozd/go/errors I had to introduce errors.Base
for sentinel errors (without a stack trace) vs. errors.New
(which get a stack trace) for this exact reason. I could not find a better solution.
Maybe it could differentiate between top-level (or init) call to errors.New
and deeper call? Would anyone instantiate sentinel errors elsewhere?
And then each time a function returns a new error (detected as a conversion between concrete type to an error interface, or returning a global variable), the compiler would populate the internal error fields with the stack trace.
That is an interesting approach which otherwise cannot be done. Maybe something which makes sense with the expensive GODEBUG
flag. I must admit that this issue made me hope that I could just run GODEBUG
in production and get stack traces everywhere. But one of the initial motivations was that you use this only at exceptional cases, when you are debugging. Maybe then it is OK to have an overhead of attaching stack traces at every return if it does not already have it?
I think it has already been said but providing a machine readable format(like json) from []uintptr would be much more beneficial and could allow better automation for observability, incident response...
I think 3rd party libraries can provide tooling around []uintptr
, both stack trace formatting, JSON marhsaling, etc. I do not think we have to bake this into stdlib. But you are right that having standard errors (if they contain stack traces for fmt formatting) could also embed stack traces if they are JSON marshaled. Not everyone formats and prints it out, some might send it as JSON. But the question then is: should this feature be used in production? But yea, having a standard JSON structure for stack traces could make things interoperable.
I want to emphasize that the goal of this proposal is to permit developers to easily enable a way for errors to include stack traces. In particular, developers can ask users to do this when running on their own systems with their own data. Ideas that rely on rebuilding the program should be a different proposal. Ideas that rely on using a different set of calls, or a different set of packages, to get a stack trace should be a different proposal. Thanks.
Changing the return value of Error()
will break anything doing manual string matching. It's bad and you shouldn't do that, but some programs do it anyway, especially when you're introspecting a third party error you don't control. Adding a new method seems much less likely to create unexpected breakages.
I like @mateusz834 suggested approach with zig Error return traces.
From a developer's perspective, I'd prefer to have the option to switch to errors with a full stack trace, even if it comes with a slight performance trade-off, as it enhances code readability. Let's consider the example of the CopyFile function. In doing so, we can observe that adding a stack trace to just one function would require from developers to use fmt.Errorf
five times. If I have to implement a full stack trace in my entire application, it would necessitate modifications in thousands of places where I'd need to wrap err with fmt.Errorf
.
func Copy(dst, src string) error {
if dst == src {
return fmt.Errorf("%w", ErrInvalid)
}
srcF, err := Open(src)
if err != nil {
return fmt.Errorf("%w", err)
}
defer srcF.Close()
info, err := srcF.Stat()
if err != nil {
return fmt.Errorf("%w", err)
}
dstF, err := OpenFile(dst, O_RDWR|O_CREATE|O_TRUNC, info.Mode())
if err != nil {
return fmt.Errorf("%w", err)
}
defer dstF.Close()
if _, err := io.Copy(dstF, srcF); err != nil {
return fmt.Errorf("%w", err)
}
return nil
}
@krhubert
func Copy(dst, src string) (rerr error) {
defer func() {
if rerr != nil {
rerr = fmt.Errorf("%w", rerr)
}
}()
if dst == src {
return ErrInvalid
}
srcF, err := Open(src)
if err != nil {
return err
}
defer srcF.Close()
info, err := srcF.Stat()
if err != nil {
return err
}
dstF, err := OpenFile(dst, O_RDWR|O_CREATE|O_TRUNC, info.Mode())
if err != nil {
return err
}
defer dstF.Close()
if _, err := io.Copy(dstF, srcF); err != nil {
return err
}
return nil
}
but would be nice to have this automatically added.
UPDATE: this would not capture return location
One side idea. Currently, when you call panic(err)
, that calls err.Error() when formatting an unrecovered panic. But some of those errors do carry additional data (e.g., stack traces). @ianlancetaylor Do you think there is some room to do some improvements here? E.g., maybe there could be optional PanicError() string
method on errors so that they could control what is printed then? Probably doing whole fmt.Sprintf("%+v", err)
is something to avoid? Or could panic handler check if error implements fmt.Formatter
interface and only then call fmt.Sprintf
?
I am writing this here because I was first thinking that fmt.Sprintf
could be used when GODEBUG=errstacktrace
is enabled, but now I am thinking that it should probably be something always available. So PanicError() string
seems to me the best approach. Should I open a separate proposal for that?
Edit: I made the full proposal #63455.
I want to echo @carlmjohnson's point about the risk with changing the return of Error()
even on an opt-in basis.
Yes, matching on Error()
is bad but it exists in codebases—usually in tests, but sometimes in business logic.
There tends to be mix of strings.Contains(err.Error(), ...)
which won't be affected by this change and err.Error() == ...
which will break.
A search across GitHub for instances of exact error string matches outside test code:
https://github.com/search?q=path%3A.go+%22err.Error%28%29+%3D%3D%22+-path%3A*_test.go&type=code
Some popular Go projects from that search: chaosmonkey, GitLab, nomad, istio. (My intent with those examples is not to call those projects out. I'm trying to highlight that this is a thing that happens even in well-established Go projects.)
Re: Zig Error Return Traces, which tracks the code path that an error took to get to the user.
Stacktraces are useful in figuring out problems with how an unexpected error happened, but Error Return Traces are useful in figuring out problems in how an expected error was handled (flow control) and surfaced to the end user.
Stack traces have a large initial cost, while error return traces scale with each frame through which an error is returned.
There’s a library for Error Return Traces in Go: https://github.com/bracesdev/errtrace
It seems to me that an ideal hybrid solution would have non-sentinel New()
errors attach a StackTrace, but any subsequent error wrapping would add a single frame reference for a Return Error Trace.
Similarly, initial wrapping of a sentinel error would attach a StackTrace, but subsequent wrapping would add a single frame reference for a Return Error Trace.
@StevenACoffman #60873 Was from this same idea ?
We should find a way to distinguish errors as pure value like EOF that doesn't need any frame attached from errors that need to be analysed with the actual place where they was returned.
I believe also that it's where we return an error that we know if we need to attach/annotate this error with the frame and nothing more (not all the stack). We already can do this return fmt.Errorf("I'm in file xxx.go line:5 and return this error: %v", err)
.
@flibustenet @StevenACoffman
I agree, the full stack trace is really useful only with the first error in the stack. And one may not always want to include a frame at all.
On another note: I do not think the frame should ever be in the error message..(except for ugly patches :))
what if for example I want to share the error with an external service, to obfuscate the frame I would need to manipulate the string (not very nice)
also, good compatibility with previous versions is a big strength of go, therefore the error string should remain unchanged to avoid a lot of refactoring in a lot of projects, and other reasons I guess..(this was already mentioned somewhere in this chat I think).
if anything, changing the error message could be an optional thing, to avoid said refactoring. I don't see much value in this. I think It should be the logging library that outputs the informations I want/need (i.e.: frame,..).
Personally my favourite decisions would be:
My favourite questions would be:
Adding the environment variable that enable error traces would finally be lead to making as default without environment table.
GODEBUG=errstacktrace=1
errors objects are not free, it cause memory allocations and with tracebacks it is more. Also its depth is depends on usecase, some are ok with 1, or some with 2, I am ok with 3 to 7 in my web applications, some java programmers might want 200. With variable depth it cannot be allocated on stack.
Might be better create another package in standard libary, like stackerror
(stackerror.New, stackerror.Wrap, etc...).
And current fmt.Errof behaviour should not change, since fmt.Errof("something: %w", stackerror.Wrap(err))
is enough.
I propose serrors
, like there is log
and slog
, we could have errors
and serrors
. :-)
Errors in Go serve two main purposes: reporting problems to the user of the program, and reporting problems to the developer of the program. The user essentially never wants to see stack traces. For a developer, they can be helpful. Thus it is reasonable for developers to use packages like https://pkg.go.dev/github.com/go-errors/errors to collect developer errors--errors that the user is not expected to see.
That said, while developing a program, there are times when it is helpful for a user error to include a stack trace. For example, this is useful when some program bug is causing an unexpected user error. Getting a stack trace may help speed up debugging. This can also be useful when a user reports an inexplicable user error that is repeatable with proprietary user data; here again a stack trace may help the developer debug the problem.
Therefore, I propose a new
GODEBUG
setting:GODEBUG=errstacktrace=1
. When thisGODEBUG
entry is set in the environment, theerrors.New
andfmt.Errorf
functions will change to incorporate a stack trace into the message. This stack trace will be part of the string returned by theError
method (which will now be a lengthy, multi-line, string).This will permit developers debugging an obscure but repeatable problem to set an environment variable to request a stack trace for the error that they are seeing.
We could also consider providing a simple mechanism for other error generation packages to check whether this
GODEBUG
setting is turned on.This was inspired by a comment by @fsaintjacques : https://github.com/golang/go/issues/60873#issuecomment-1745646649.