Open srosenberg opened 1 year ago
Do we have a mechanism of ensuring panics are NOT recovered? I know in many instances we'll recover panics and, for example, report to Sentry. In most of these instances, we do want Cockroach to process exit because it's fundamentally unsafe to continue running.
Do we have a mechanism of ensuring panics are NOT recovered?
Not to my knowledge. Calls to pebble (from crdb) are typically executed via Stopper.RunXXX
which ensures any panic is reported (logcrash.ReportPanicWithGlobalSettings
) first [1], and then propagated all the way to start.go
which will handle panic
in RecoverAndReportPanic
[2], eventually leaving it to the runtime to crash the process. E.g., the stacktrace in [3] has pebble's SetMinVersion
invoked from grpc's interceptor, using Stopper.RunTaskWithErr
.
I know in many instances we'll recover panics and, for example, report to Sentry.
Could you give a couple of examples. In the majority of cases that I've seen, panics from pebble will not be recovered.
In most of these instances, we do want Cockroach to process exit because it's fundamentally unsafe to continue running.
Right, this is the case where we absolutely must crash the process at this moment in order to preserve data integrity. What about runtime's throw
? I see it's already used by pebble [4].
[1] https://github.com/cockroachdb/cockroach/blob/02ff871a4b4e16f7082298d7e098bfed70f528a9/pkg/util/stop/stopper.go#L338 [2] https://github.com/cockroachdb/cockroach/blob/02ff871a4b4e16f7082298d7e098bfed70f528a9/pkg/cli/start.go#L658 [3] https://github.com/cockroachdb/pebble/issues/2019#issue-1416668702 [4] https://github.com/cockroachdb/pebble/blob/b53472d0ec8663867ebb480ec64214ecf15e5f86/internal/manual/manual.go#L45
Thoughts from Jackson - we channel all of these "assertion" exits into a common codepath that does the exit. That way we have these in the one place.
And then we add a linter over top that ensures that we don't call exit other than this new codepath.
In light of a recent
exit 1
bug [1], [2], it seems prudent to revisit the contract wrtos.Exit
vs.panic
. Since pebble is consumed by CRDB as a library,exit
should be eschewed. In CRDB,os.Exit
is forbidden; this is enforced by theTestOsExit
linter [3]. Running the linter against pebble reveals a number of potentialos.Exit
cases,[1] https://github.com/cockroachdb/pebble/pull/2020 [2] https://github.com/cockroachdb/cockroach/pull/90478 [3] https://github.com/cockroachdb/cockroach/blob/master/pkg/testutils/lint/lint_test.go#L1239
Jira issue: PEBBLE-136