golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.29k stars 17.7k forks source link

runtime: new goroutine launches can starve other work indefinitely #52934

Open rhysh opened 2 years ago

rhysh commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version devel go1.19-db875f4d1b Mon May 16 16:15:58 2022 +0000 darwin/arm64
$ go1.18 version
go version go1.18.2 darwin/arm64

Does this issue reproduce with the latest release?

The problem is present in Go 1.18 and in the development tip.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rhys/go/pkg/mod"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="devel go1.19-db875f4d1b Mon May 16 16:15:58 2022 +0000"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lf/n4ry0qv5639_3z0nhyhy3z7c0000gs/T/go-build743809376=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

// Copyright 2022 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package main

import (
    "fmt"
    "os"
    "sync"
    "sync/atomic"
    "time"
)

const duration = 2000 * time.Millisecond
const workers = 3

var stop int32

func main() {
    time.AfterFunc(duration, func() { atomic.StoreInt32(&stop, 1) })

    var wg sync.WaitGroup
    wg.Add(1)
    go func() { wg.Done() }()
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            wg.Done()
            churn()
        }()
    }
    wg.Wait()

    if atomic.LoadInt32(&stop) == 1 {
        fmt.Printf("PROBLEM: first goroutine did not get a chance to run\n")
        os.Exit(1)
    }
    fmt.Printf("ok\n")
}

func churn() {
    if atomic.LoadInt32(&stop) == 0 {
        go churn()
    }
}

What did you expect to see?

I expected each goroutine in the program to get a chance to run within a few 10ms scheduling intervals. The result of that would be for both wg.Done calls to execute, for the wg.Wait call to complete, and for the program to exit quickly without error.

What did you see instead?

When I run the test program with GOMAXPROCS=1, the new goroutine launches keep the runtime busy so it does not give all existing goroutines a chance to run. The goroutine that does nothing but call wg.Done does not get a chance to run until the timer expiration stops the new goroutine launches after 2000 milliseconds.

This isn't specific to GOMAXPROCS=1, additional calls to go churn() can keep additional scheduler threads busy.

Refined from an original report in #52916. CC @golang/runtime

$ GOMAXPROCS=1 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go    
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=2 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=3 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=4 go run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
ok
$ GOMAXPROCS=1 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=2 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=3 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
PROBLEM: first goroutine did not get a chance to run
exit status 1
$ GOMAXPROCS=4 go1.18 run ~/Documents/src/nursery/issue/golang/go/52xxx/churn.go
ok
gopherbot commented 2 years ago

Change https://go.dev/cl/406654 mentions this issue: runtime/pprof: slow new goroutine launches in test

prattmic commented 2 years ago

The runtime does generally give some priority to new goroutines, because runtime.newproc puts new goroutines in the runnext slot, and runtime.findRunnable (generally) prefers the local runq (of which runnext is on top) over the global runq. Perhaps we are a bit too aggressive there.

runnext does make the new goroutine share the same time slice with the parent goroutine (see inheritTime here), but I suspect that churn runs for such a short period of time that sysmon's attempts to preempt it are generally unsuccessful [1].

[1] I think to succeed, it would need to preempt after churn starts executing, but before it gets to the body of runtime.newproc, which is realistically probably only a dozen or so instructions.

shubhang93 commented 7 months ago

This is still prevalent in Go 1.22 and happens mostly when I am running the go test command with GOMAXPROCS = 1