golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.13k stars 17.45k forks source link

runtime: concurrent map iter+write fatal doesn't respect GOTRACEBACK #68019

Open bradfitz opened 2 months ago

bradfitz commented 2 months ago

Go version

go version go1.22.4 linux/amd64

What did you do?

What did you see happen?

... we hit the race on the map in prod after an hour and 💣

The Go runtime then did its fatal:

fatal error: concurrent map iteration and map write

But then we got 3 GB of GOTRACEBACK stacks from the Go runtime for 1M+ stacks, about 3 GB more than we needed to find the bug, overwhelming our logging system in the process, causing a secondary outage.

GOTRACEBACK docs say:

The failure prints stack traces for all goroutines if there is no current goroutine or the failure is internal to the run-time.

And arguably a map race is internal to the run-time insofar as maps are implemented in the runtime.

But the Go runtime know it's the user's fault; note the "but is used when user code is expected to be at fault for the failure" bit here:

// fatal triggers a fatal error that dumps a stack trace and exits.
//
// fatal is equivalent to throw, but is used when user code is expected to be
// at fault for the failure, such as racing map writes.
//
// fatal does not include runtime frames, system goroutines, or frame metadata
// (fp, sp, pc) in the stack trace unless GOTRACEBACK=system or higher.
//
//go:nosplit
func fatal(s string) {
        // Everything fatal does should be recursively nosplit so it
        // can be called even when it's unsafe to grow the stack.
        systemstack(func() {
                print("fatal error: ")
                printindented(s) // logically printpanicval(s), but avoids convTstring write barrier
                print("\n")
        })

        fatalthrow(throwTypeUser)
}

What did you expect to see?

I'd expect fatalthrow(throwTypeUser) to be treated as a normal panic with GOTRACEBACK=single respected.

Currently it's ignored:

bradfitz@book1pro ~ % GOTRACEBACK=single go run maprace.go 2>&1 | wc -l
     618
bradfitz@book1pro ~ % GOTRACEBACK=none go run maprace.go 2>&1 | wc -l
       2
bradfitz@book1pro ~ % GOTRACEBACK=system go run maprace.go 2>&1 | wc -l
    1684

(At least none works, but that's too quiet, hiding the actual problem).

gabyhelp commented 2 months ago

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

MikeMitchellWebDev commented 2 months ago

It says the following in runtime1.go

// gotraceback returns the current traceback settings.
//
// If level is 0, suppress all tracebacks.
// If level is 1, show tracebacks, but exclude runtime frames.
// If level is 2, show tracebacks including runtime frames.
// If all is set, print all goroutine stacks. Otherwise, print just the current goroutine.
// If crash is set, crash (core dump, etc) after tracebacking.