golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.71k stars 17.5k forks source link

text/template: provide full stacktrace when the function call panics #59400

Open wxiaoguang opened 1 year ago

wxiaoguang commented 1 year ago

At the moment, the code is:

// safeCall runs fun.Call(args), and returns the resulting value and error, if
// any. If the call panics, the panic value is returned as an error.
func safeCall(fun reflect.Value, args []reflect.Value) (val reflect.Value, err error) {
    defer func() {
        if r := recover(); r != nil {
            if e, ok := r.(error); ok {
                err = e
            } else {
                err = fmt.Errorf("%v", r)
            }
        }
    }()
    ret := fun.Call(args)
    if len(ret) == 2 && !ret[1].IsNil() {
        return ret[0], ret[1].Interface().(error)
    }
    return ret[0], nil
}

If the function in template panics: {{.TheUnstableFunc}} , users and developers only know that it panics, but there is no stracktrace.

In a complex system, the template functions could also be complex, without stacktrace, it's difficult to locate the root problem.

It's really helpful to provide a full stacktrace when the template function panics.

We have encounters a related bug today, the TheUnstableFunc has concurrency bug, template package only reports runtime error: slice bounds out of range [2:1], it costs a lot of time for debugging the bug.

Thank you.

seankhliao commented 1 year ago

cc @robpike

ianlancetaylor commented 1 year ago

I don't see any reason for this to be a proposal, so taking it out of the proposal process.

Can you show us a complete, standalone example that demonstrates the problem? I don't understand it from the description. Thanks.

wxiaoguang commented 1 year ago

We have encounters a related bug today, the TheUnstableFunc has concurrency bug, template package only reports runtime error: slice bounds out of range [2:1], it costs a lot of time for debugging the bug.

The background is: we have a very complex function to be called in template: {{.TheComplexFunc}} , in TheComplexFunc, there are a lot of function calls including TheBuggyFunc, which has a concurrency problem, like this (just for example, to show it is complex):

func (t *Type) TheComplexFunc() {
    func1()...
    func2()...
    while ... {
         if (func3()) {
            TheBuggyFunc()
         } else {
             func4()
         }
    }
    func5()
}

Then we call the TheComplexFunc in template:

html code ...
{{$t.TheComplexFunc ....}}
html code ...

Since the TheBuggyFunc has concurrency bug, on high concurrent requests, it panics with slice bounds out of range [2:1].

But in error log, we only see: executing "my_template" at <$t.TheComplexFunc>: error calling TheComplexFunc: runtime error: slice bounds out of range [2:1]

TheComplexFunc is quite complex, it's unclear where the panic happens. I know that there could be some methods to help to locate the problem, but if we can see the full stacktrace in error:

executing "my_template" at <$t.TheComplexFunc>: error calling TheComplexFunc: runtime error: slice bounds out of range [2:1]

/framework.go:123 cases.String()
/my-code.go:123 TheComplexFunc()
/my-code.go:456 TheComplexFunc()
/template.go:... execute()

We can find the problem in first time.

Related issue: https://github.com/go-gitea/gitea/issues/23872

ianlancetaylor commented 1 year ago

It's impossible to be sure without an example, but I think you are referring to https://go.googlesource.com/go/+/refs/heads/master/src/text/template/funcs.go#355 . When a template calls a function, and the function panics, the template turns that into an error. That means that the default stack trace will not appear.

I don't think that we can change that behavior today. That would be too likely to break existing working code.

wxiaoguang commented 1 year ago

I don't think that we can change that behavior today. That would be too likely to break existing working code.

To avoid breaking, how about allowing developers to use a self-defined panic handler?

func safeCall(fun reflect.Value, args []reflect.Value) (val reflect.Value, err error) {
    defer func() {
        if r := recover(); r != nil {
            rErr := t.panicHandler(r) // <------ HERE
            if e, ok := rErr.(error); ok {
                err = e
            } else {
                err = fmt.Errorf("%v", rErr)
            }
        }
    }()
    ...
}

t.SetPanicHandler(func (v any) error {
    // we have a chance to handle the panic
})

t.Execute(...)
wxiaoguang commented 1 year ago

Hi @ianlancetaylor , what do you think about this proposal?

robpike commented 1 year ago

I would rather not make this change for a number of reasons, compatibility being one. But to find your bug you can either fork the package to add the traceback code (easy), delete the defer block (even easier), or just use a debugger (maybe easy, depending on the environment), which will in effect breakpoint when the trap happens before the template package catches it.

In other words, it is your responsibility to find the bug, not the library's. And really, it should be very easy to do.

wxiaoguang commented 1 year ago

compatibility being one

Sorry but I didn't see any compatibility problem here, we just need to add a new function, nothing breaks.

If there is any compatibility problem, please let me know.

But to find your bug you can either ...........

If a developer could catch the panic stacktrace in first time, why they should do so much unnecessary work? Especially for a production rare crash. A clear stacktrace saves everyone's time. It's really helpful.

it is your responsibility to find the bug

It is developer's responsibility to find bugs, but it's framework / language's responsibility to help developers to find bug, instead of hiding important clues / just returning an unclear error message.

I would say the template package do wrong here, because it doesn't follow Golang's philosophy: use panic stracktrace to locate the incorrect code.

And really, it should be very easy to do.

In many cases, not that easy. If framework could help, it is easier.


Could you please re-consider about this proposal? Thank you very much.

ianlancetaylor commented 1 year ago

This issue is related to the fact that fmt.Printf and friends catch panics in String and GoString methods, and the way that net/http catches panics in handlers. The behavior in the fmt, net/http, and text/template packages is normally good. Occasionally it impedes debugging. If we're going to change anything--and I'm not sure we are--perhaps we should consider some way to change it for all of them.

I don't think SetPanicHandler is a good choice as that injects too much flexibility. The choices are "recover panic" (as we do today) and "don't recover panic."

For example, one could consider setting the environment variable GODEBUG=recoverpanic=0.

robpike commented 1 year ago

Rather than change the libraries by requiring them to examine an environment variable, which will lead to a cascade of clunky changes, we could instead - as @ianlancetaylor may be saying above - just change the runtime to disable recovery of traps and panics (separately; panic is often used programmatically and it may just break things to disable it completely, while a trap caused by an array index out of bounds or a floating-point error is almost universally a bug).

wxiaoguang commented 1 year ago

and the way that net/http catches panics in handlers. The behavior in the fmt, net/http, and text/template packages is normally good. Occasionally it impedes debugging.

This problem is different from net/http:

  1. We can wrap every handler for net/http, we can easily catch all panics in our code. So I believe the net/http is very friendly to developers and we do not need to change it for panic handling.
  2. But we can't wrap every function for text/template, because some are called on a struct directly like {{.MyVar.BadFunnc}}.

For example, one could consider setting the environment variable GODEBUG=recoverpanic=0.

I do not think recoverpanic=0 is feasible. It is just a whack-a-mole game, and it might not catch the panic I described above.

For example, many large projects like Gitea, it uses quite a lot of 3rd packages, and it self sometimes also uses recover() to catch some well-defined panics.

In a busy production system, before the race panic occurs, a lot of other unrelated panics would occur ahead, I can't imagine the recoverpanic=0 could help to catch the real race panic.

ianlancetaylor commented 1 year ago

As @robpike says we can just disable recovering a runtime panic, while still recovering an explicit call to panic. That may break a few packages out there but it seems unlikely to affect many. It should give a stack trace for the case described in the original comment, as "slice range out of bounds" is a runtime panic.

To be clear, I'm not sure whether this is a good idea.

wxiaoguang commented 1 year ago

just disable recovering a runtime panic, while still recovering an explicit call to panic.

I see, if runtime panics like slice range out of bounds / nil pointer dereference / etc could be exposed, it's definitely better than before.

robpike commented 1 year ago

@ianlancetaylor If it breaks something, it's would only be under the flag, where almost by definition it is OK, compatibility-wise.

If we do anything, this seems the best. Programming each library to handle panics differently sometimes would be a mess.

wxiaoguang commented 1 year ago

Hello, is there some progress? We meet new runtime panic - error again https://github.com/go-gitea/gitea/pull/24342#discussion_r1181060895 , it's really difficult to debug, developers have to write a lot of temp "recover" code.

andreyvit commented 1 year ago

Just to add a data point, I have printstack snippet in my editor that expands into:

    defer func() {
        if e := recover(); e != nil {
            log.Printf("** PANIC: %v", e)
            debug.PrintStack()
            panic(e)
        }
    }()

and it's there solely for use with template functions, and every time it's a nuisance and takes a bunch of guesswork to debug, and every time I wish template didn't ever recover from panics.

I'd welcome an option to disable recover in safeCall for text/template specifically. I never want those panics to be recovered from, not even in production, so would just disable recovery permanently.

Like others noted, other panic recoveries (in net/http, for example) are not a problem, this really is specific to templates.

wxiaoguang commented 1 year ago

Ping, could the proposal be considered and accepted?

ianlancetaylor commented 1 year ago

I don't think we know exactly what to do. Do you want to try writing a trial implementation?

wxiaoguang commented 1 year ago

Some approaches:

  1. Don't introduce any new option, make safeCall panic again if the err is runtime error:
    • not ideal enough, I think it's better to keep consistent "panic" behavior
  2. Add an option: PanicHandler, make safeCall calls the PanicHandler first when the panic occurs
  3. Make safeCall return type ErrorWithStack struct { ... } when panic occurs
    • it could also be controlled by an option if necessary
  4. Add an option: NoPanicRecover or Option("onpanic=nop"), do not recover any panic in safeCall.

(I can see that there were already some options like "missingkey=zero", so adding a new option for the panic behavior seems not that difficult or breaking)

Which one do you prefer? Or could there be more possible approaches?

robpike commented 1 year ago

I still prefer doing nothing. If you want a traceback, edit safeCall to not recover. The existing behavior is right almost always.

wxiaoguang commented 1 year ago

I still prefer doing nothing. If you want a traceback, edit safeCall to not recover. The existing behavior is right almost always.

Sorry but I didn't get the point. How could "the existing behavior (hide the panic stacktrace)" be "right"? It causes some template function calls hard to debug their panics. There are also other reports like https://github.com/golang/go/issues/59400#issuecomment-1635480289 .

edit safeCall to not recover.

Do you mean every developer who uses Golang template should learn to build Go compiler/packages from source and use their home-made toolchain to debug the panics? I do not think it is friendly to developers and the eco-system.

wxiaoguang commented 1 year ago

I don't think we know exactly what to do. Do you want to try writing a trial implementation?

@ianlancetaylor what do you think about these possible approaches?

IMO the option 4 (reuse the option function: Option("onpanic=nop")) seems the best approach?

ianlancetaylor commented 1 year ago

I think that some version of option 3 seems workable. If safeCall recovers a panic, it would return an error that includes a stack trace at the point of the panic. I think that would be reasonably backward compatible.

robpike commented 1 year ago

I feel it's a bad precedent to start including true stack traces in errors. If you need to do something - and it still seems to me that you don't - an opt-in option that disables error recover feels right.