Open serathius opened 2 weeks ago
/assign @henrybear327
Hey @serathius, if I understand correctly, we would like to add a new failpoint type that leverages os.Exit(1)
instead of panic()
, is it? :)
Yes, would be also good to confirm how big is the difference between them. Not a Go expert but maybe there is explanation in Go documentation how Go runtime handles panic. It should help us confirm if we should have both options.
ALso, to not lose functionality from panic
, before we call os.Exit(1)
we would dump stacktrace to stderr.
Yes, would be also good to confirm how big is the difference between them. Not a Go expert but maybe there is explanation in Go documentation how Go runtime handles panic. It should help us confirm if we should have both options.
ALso, to not lose functionality from
panic
, before we callos.Exit(1)
we would dump stacktrace to stderr.
From my initial research, panic()
will stop the ordinary flow of control, execute the functions defined with defer
, and return to its caller. It's recoverable with recover()
within the defer
function if you want to handle it there.
On the other hand, os.Exit(1)
will terminate the program, without executing the defer functions, with no chance to recover, it's a hard stop of the program.
References:
For simulating a failing node, os.Exit(1)
would be a more realistic approach as it directly terminates the program. A quick code search (if I am not mistaken) shows that we are not handling the panic()
with recover()
within the codebase at the top level, so the effect of terminating a program seems to be the same to me so far (maybe there might be other shutdown mechanisms that I am not aware of so my conclusion might be wrong). But panic()
will contain more debugging traces for future analysis though.
What would you like to be added?
Using panic in failpoints results in process exiting, however the exit is handled through Golang runtime. It would be preferred to do a proper hard crash. This should be achievable by calling
os.Exist(1)
.cc @MadhavJivrajani @fuweid @henrybear327
Why is this needed?
More realistic failpoints.