Closed bcmills closed 4 years ago
Panics within panics hurt my brain.
I don't think this would be too hard to fix. We just need another bit in the goroutine to record the "in Goexit" state.
There's a comment in the runtime (from 2014):
// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
// take defer off list. The earlier panic or Goexit will not continue running.
So it looks like the current behavior is at least somewhat intentional. @rsc, who wrote that comment.
panic
implementation. We should see if we can incidentally address this issue as part of that effort.This example is actually a bit of a special case, because the Goexit call and the recover are in the same stack frame/function:
func maybeGoexit() {
defer func() {
fmt.Println(recover())
}()
defer panic("cancelled Goexit!")
runtime.Goexit()
}
In this case, we are running somewhat into the "abort" rule (which is currently only described in the implementation, not in any spec). If the processing of defers by a recursive panic gets to the frame where an outer panic or Goexit happened, then that outer panic/Goexit is aborted, because there is no mechanism to resume it with a recover (since a recover always finishes the defers of the current frame and then resumes execution with the next outer frame). The abort rule is exactly related to the comment by @rsc above. If we wanted to allow the Goexit to continue, I think we would have add a bunch of mechanism (maybe some stack smashing) just to deal with this case. (The abort case is also mentioned in the Requirements section for the just-added open-coded defer proposal: https://github.com/golang/proposal/blob/master/design/34481-opencoded-defers.md#requirements )
But note that the more likely/useful case (where a panic and recover happens completely within a deferred function during a Goexit() works fine. The recover happens, and the Goexit() continues.
https://play.golang.org/p/kcQBNOAUeSa
So, for instance, it would work fine if a panic-recover happened during a Printf in a defer function (which can happen with nil pointers, etc. of the printed args).
So, I'll think about this a bit more, but I would lean toward not fixing this (because it would require a bunch of added mechanism just for this one case).
I doubt the current implementation of Goexit is to create a special panic. In @bcmills's example, the panic is recovered, but in @danscales' example, the panic is not recovered.
If this is true, I lean toward the panic created in Goexit should be always unrecoverable.
In fact, the key here is, if Goexit creates a panic, in @bcmills's example, the panic created in Goexit is suppressed by the new panic created in defer panic("cancelled Goexit!")
, which is consequently recovered. (If the suppression doesn't happen, the the recover will not happen too, for the panic created in Goexit is unrecoverable and will not cause crashing.) So, if the issue needs to be fixed, then we should avoid letting the new panic created in defer panic("cancelled Goexit!")
suppress the special panic created in Goexit.
note that the more likely/useful case (where a panic and recover happens completely within a deferred function during a Goexit() works fine
I agree that that's more likely, but there are realistic scenarios that could trigger this bug. For example, consider https://play.golang.org/p/vSL3LWEGpm2.
This example is actually a bit of a special case, because the Goexit call and the recover are in the same stack frame/function
Note that the issue can be reproduced with the Goexit
, recover
, and panic
all in separate functions (and presumably frames):
https://play.golang.org/p/M93Kxa4aBhX
I agree that that's more likely, but there are realistic scenarios that could trigger this bug. For example, consider https://play.golang.org/p/vSL3LWEGpm2.
Yes, you are demonstrating the exact panic abort semantics that I mentioned. Just to describe it again, an outer panic/Goexit is aborted if the inner panic reaches & processes any defers in the same frame or in a higher (outer) frame from the frame where the panic/Goexit happened. Once the outer panic/Goexit is aborted, it will no longer apply even if there is a recover() that recovers the inner panic.
I think you want to stop these unspecified abort semantics. Instead, you want to change the implementation so that if an inner panic is recovered in a frame at the same level or even further down in the stack from where the output panic/Goexit happened, you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.
Note that if the recover happens in a frame which is higher in the stack from the output panic/Goexit, we still want the current behavior, which is that the defers in the current frame are finished, and then "normal" execution continues in the next outer frame, which is some function called by a defer caused by the outer panic. So, the outer panic will continue as expected, but it will get to complete the running of the defer function that caused the panic.
Since the behavior of recursive panics/Goexits (including the abort semantics) was never specified in the first place, I think we have leeway on whether/how we want to fix this.
I've tried to get some feedback on improving the description of panic/recover/recursive panics in the Go language spec, but don't think we have consensus yet about whether or what to improve:
https://go-review.googlesource.com/c/go/+/189377
Feel free to add comments....
Instead, you want to change the implementation so that if an inner panic is recovered in a frame at the same level or even further down in the stack from where the output panic/Goexit happened, you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.
OK, and I think the hard part about doing this is that we presumably want to clean up the stack of the inner panic and somehow make it look like we are back to the outer Goexit/panic processing. That seems really hard to do without a lot of stack gymnastics (possibly stack smashing) and/or extra stubs to allow us to get control back to the runtime after cleaning up the stack. If we don't require that the stack is cleaned up, then getting rid of the abort behavior seems more doable.
I think you want to stop these unspecified abort semantics. Instead, […] you want the defer processing of the frames to continue under the auspices of the outer panic/Goexit.
Maybe? Consider https://play.golang.org/p/MyBCSSorYFl: the outer function defers a recover
and a cleanup function and then panics, and the cleanup function also panics independently
Intuitively, what I would expect is for the first defer
to catch the first panic
(“panic!
”), and for the second panic
(“whoops!
”) to continue propagating outward, since that panic
occurred after the execution of the function that performed the initial defer
.
I could also reasonably see dropping the second panic entirely, since the fact that it occurred is likely a side-effect of some invariant violated as a result of the first panic: if the first panic is fixed, perhaps the second one won't occur at all.
Today, the first recover
instead receives the second panic
, and the details of the first panic
are completely lost.
(I would call the second panic
the “inner” one, but maybe I'm using “inner” and “outer” a bit differently than you are.)
If panics are maintained in a stack and only the top one can be recovered (this example shows the possibility is high), then the panics created by Goexit calls shouldn't be pushed into the panic stack. Instead, each goroutine should maintain a separated field to record the last (or the first) panic created by Goexit calls.
BTW, the following program proves that if a panic created by Goexit is at the top of the panic stack, then all other panics will also not get recovered.
package main
import "fmt"
import "runtime"
func main() {
go func() {
defer func() {
fmt.Println("bb", recover()) // <nil>
}()
defer func() {
defer func() {
fmt.Println("aa", recover()) // <nil>
}()
defer runtime.Goexit()
panic(2)
}()
panic(1)
}()
select{}
}
@ianlancetaylor It looks gccgo and gc disagree on the last example.
@go101 Thanks for noticing that difference. I think we should work out this issue, and then we can make sure that the implementations do the same thing.
It looks the difference is, in gccgo, the panic created in Goexit will not shadow the newest panic, whereas gc will. In gccgo, the panic created in Goexit will shadow the panics before the newest panic.
Yes, but let's first figure out what is correct, and then implement that.
Personally, I think panicking and exiting should be two different properties of gorotines.
Change https://golang.org/cl/200081 mentions this issue: runtime: ensure that Goexit cannot be aborted by a recursive panic/recover
I figured out how to make sure that Goexits are not aborted by recovers that happen in defers below them... It didn't require any stack smashing or extra stubs -- just saving the pc/sp when making calls to deferred functions from the Goexit defer processing loop. See the mentioned change for the fix.
We discussed the abort behavior with respect to panic and Goexit, and are leaning toward only changing the Goexit behavior (which is the mentioned changed). As summarized well by Robert at https://go-review.googlesource.com/c/go/+/189377/7/doc/go_spec.html#6169 [paraphrased here]
From a programmer's point of view, if a defer contains a recover() call, I expect it to catch any pending panic, no matter where they are coming from. Otherwise it's impossible to write code that protects against (expected and unexpected) panics leaving a function/API entry point. So that would argue against "only recover the panic initiated in its own function".
And the abort (replace) behavior for panics exactly coalesces panics down so that we make sure that a recover in a function always stops all panics within that function or its callees.
There is another related question: should a Goexit
call save the a panicking goroutine from crashing? The current gc and gccgo both do this. Personally, I think it is not very reasonable.
package main
import "runtime"
import "time"
func main() {
go func() {
defer runtime.Goexit() // [edit]: add the "defer"
panic(1) // will not crash the program
}()
for {time.Sleep(100000)}
}
@go101, in that snippet the runtime.Goexit()
exits the goroutine immediately, so the panic
line is not even reached.
But if you defer
the call to runtime.Goexit()
, that does suppress the panic (https://play.golang.org/p/JeXz2KhA5f_h), and I agree that it should not.
A more realistic example illustrating that problem (https://play.golang.org/p/Gx4zGz5L79i):
func TestSkipAfterPanic(t *testing.T) {
defer func() {
if !t.Failed() {
t.Skip("huh?")
}
}()
panic("serious bug that should fail this test!")
}
Yes, I missed the defer
before Goexit
. ;D
It looks the docs says T.Skip()
calls Goexit()
. It also says if a test has already failed, Skip
doesn't change the failed state. So panicking is not viewed as testing failed?
The test doesn't fail per se until the panic
propagates out of the Test
function, so an active panic is indeed not a failure at that point.
Feel free to file a separate bug that Goexit can cancel a Panic. I'm not sure that fix is very high priority, since it seems very unusual to put a Goexit (or something that would do a Goexit) in a defer statement. Also, I'm not sure what a reasonable change in semantics would be. Should Goexit() become a no-op if you are in a panicking sequence? I think that would be surprising to most programmers. Or would you do something more dramatic, like aborting the entire defer call that contains the Goexit (and is being run by the panicking sequence)?
@danscales, probably the Goexit
should allow the panic to continue, but switch to an orderly Goexit
if all of the active panics are eventually recovered.
Yes, but I think a Goexit becoming a no-op in the panic case may be quite surprising to programmers and hence lead to other weird failures in the code immediately following the Goexit.
In practice the ~only real user of runtime.Goexit
is the testing
package, and for that case it seems decidedly less weird for an unrecovered panic to cause a test failure than for t.Skip
to cancel an unrecovered panic.
But you're right that I would need to update the double defer
sandwich to account for the possibility of a Goexit
resuming after a successful recover
. Perhaps it will need to become the “triple defer
with cheese”. 😃
Another option would be to make runtime.Goexit
itself panic if the goroutine is already panicking. That would at least make the t.Skip
example do something more reasonable (specifically, fail the test if it tries to Skip
during an unrecovered panic).
Maybe two panic stacks are needed. One for normal panics, the other for Goexit panics. Or each non-exited function call may associate with at most two unrecovered panics, one is normal panic, the other is Goexit panic. Goexit panics are not recoverable. Either of two panics existing will make the associated function call enter its terminating execution phase,
I just modified my explanation on the panic/recover mechanism to include Goexit
signals.
One reason I think Goexit
signals should not shadow normal panics is that, if we put a defer runtime.Goexit()
call at the beginning of a goroutine entry function, then the defrerred call acts as an ultimate recover operation.recover
call
By describing it in non-optimized code:
type _panic struct {
// Both of the two can be true.
// causedByPanic may be changed from true
// to false, but causedByExit may not.
causedByPanic bool
causedByGoExit bool
headReason *_panicReason
tailReason *_panicReason
}
type _panicReason struct {
value interface{}
prev *_panicReason
}
// When a function call exits, its associating `_panic`
// will be merged with the `_panic` associating with its caller.
...
if exitedCall.panic.causedByPanic {
caller.panic.causedByPanic = true
if exitedCall.panic.tailReason != nil {
exitedCall.panic.tailReason = caller.panic.headReason
}
caller.panic.headReason = exitedCall.panic.headReason
if caller.panic.tailReason == nil {
caller.panic.tailReason = caller.panic.headReason
}
}
caller.panic.causedByGoExit = caller.panic.causedByGoExit || exitedCall.panic.causedByGoExit
if !caller.isExiting && (caller.panic.causedByGoExit || caller.panic.causedByPanic) {
caller.startExiting()
}
...
[edit]: to optimize, the panic reason list can be move the the _goroutine
struct as a stack (just as the current implementation), and the two bool
s can be replaced with bits.
@griesemer @ianlancetaylor Is it possible that gc and gccgo can make agreement on the panic and Goexist mechenism in Go 1.14? Specifically,
In my opinion, the earlier the points are clarified, the better for the Go community.
I don't understand precisely what your questions are, but I agree that these questions need to be resolved in the language spec.
That said, it does not seem urgent to me. These issues have been confused for many years and nobody has cared. If we sort them out for 1.14, great. If not, OK.
I wrote a book and one chapter in it explains the panic/recover mechanism. And I plan to make it cover runtime.Goexit
calls. So I very care about this. I really don't want to spread wrong explainations into the Go community.
I think the implementation is not urgent to be corrected (if the current implementation is really not perfect), but the mechanism should be settled down as earlier as possible, if there are no special difficulties to prevent it being done.
BTW, @ianlancetaylor , thanks for your explanation in another thread. I will write down the confirmed part in my book. I asked this question is because I need to finish the chapter in my book. If this is impossible in Go 1.14, then I will mention the Goexit
related part is still not formalized yet., otherwise, I will wait unitl it is formalized.
@go101 I think we have mostly settled that the behavior of panic and recover will stay as it is right now. My current proposed changes to the Go language spec are here (don't know if you've seen the latest changeset):
https://go-review.googlesource.com/c/go/+/189377
I/we are not trying to specify every possibility for panics/recovers in the spec, but just want to specify better when recovers actually apply (only in a defer directly called by a panicking sequence) and then describe the the typical case of recursive panics and also the behavior of when one panic can replace another panic. See also my comment on Oct 9th above on why the panic replacement behavior makes sense.
Goexit is a library routine, so it will not be specified in the language spec. However, as I mentioned above, we are planning to fix the current bug and never allow a panic/recover to cancel a Goexit. We have to discuss more about further interaction between panic and Goexit (as your example in your Sept 26 comment and examples from Bryan Mills illustrate). The fix for the current bug may make it into Go 1.14, but not definitely. Since we still have to discuss things, I'm fairly sure that no further changes in Goexit/panic/recover interaction will make it into Go 1.14.
@danscales Thanks for the clarification.
we are planning to fix the current bug and never allow a panic/recover to cancel a Goexit.
What about the inverse? Will Goexit calls cancel unrecovered panics?
I am guessing that you are asking whether defer Goexit()
should recover a panic that is currently being thrown. I don't think it should. Goexit
is not recover
.
Yes, that is what I asked. Thanks for the clarification.
Should this following program print should be unreachable
or not.
I tried tip, it prints it.
edit: sorry, the following program is a wrong example. Please see this comment for reason. ;D
package main
import "fmt"
import "runtime"
func f() {
defer func() {
recover()
}()
panic("will cancel Goexit but should not")
runtime.Goexit()
}
func main() {
c := make(chan struct{})
go func() {
defer close(c)
f()
fmt.Println("should be unreachable")
}()
<-c
}
And should the following program crash? I tried tip. It doesn't, though I feel it should crash by @ianlancetaylor's clarification above.
package main
import "runtime"
func main() {
c := make(chan struct{})
go func() {
defer close(c)
// The Goexit signal shadows the
// "bye" panic, but it should not.
defer runtime.Goexit()
panic("bye")
}()
<-c
}
Should this following program print
should be unreachable
or not.
I agree that it should not, and I can reproduce the same behavior you observe using gotip
.
And should the following program crash?
That one is less clear to me, although I agree that @ianlancetaylor's comment suggests that it should crash.
I agree with @bcmills.
Should this following program print
should be unreachable
or not. I tried tip, it prints it.package main import "fmt" import "runtime" func f() { defer func() { recover() }() panic("will cancel Goexit but should not") runtime.Goexit() } func main() { c := make(chan struct{}) go func() { defer close(c) f() fmt.Println("should be unreachable") }() <-c }
This is exactly performing to spec and is unrelated to the current bug. https://golang.org/ref/spec#Handling_panics and https://go-review.googlesource.com/c/go/+/189377
The panic() call starts a panic sequence. By spec, the function that called panic will never continue running normally (hence you will never reach the Goexit). If a recover occurs in a defer directly called by the panicking sequence, then the panic will be recovered, and (after finishing any remaining defers in that frame), execution of the goroutine will continue in the caller of that frame (which in this case is f).
package main import "runtime" func main() { c := make(chan struct{}) go func() { defer close(c) // The Goexit signal shadows the // "bye" panic, but it should not. defer runtime.Goexit() panic("bye") }() <-c }
This is the issue that we mentioned above, which is that a Goexit can cancel a panic. I just filed a separate bug for this (https://github.com/golang/go/issues/35378), and we can discuss there. As I mentioned, any fix so that a Goexit does not cancel a panic will not make it into 1.14.
Ah, yeah. I somehow read the panic("will cancel Goexit […]")
as defer panic([…])
instead. 😞
Ah, sorry, I also acted as it is a deferred call. ;D
It looks this problem has been fixed in Go SDK 1.14. Should it be closed?
Yes, let's close. This was fixed by my change 7dcd343 . I already opened a separate issue #35378 about the converse situation that a Goexit call can cancel a panic.
The documentation for
runtime.Goexit
says (emphasis mine):However, this is empirically not the case if one of the deferred functions itself triggers a
panic
: thatpanic
can berecover
ed, and thatrecover
cancels the termination of the goroutine.See https://play.golang.org/p/7VrxPDByUNT for a minimal illustration.
Found while investigating #29207.
CC @aclements @randall77