golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.1k stars 17.68k forks source link

runtime/race: issues on Go Memory Model and TSAN Race Detector #69821

Closed zhengshi1998 closed 1 month ago

zhengshi1998 commented 1 month ago

Proposal Details

Go version

go1.22.7 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/shizheng/.cache/go-build'
GOENV='/home/shizheng/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE=''
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH=''
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/shizheng/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/shizheng/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.7'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1772322777=/tmp/go-build -gno-record-gcc-switches'

What did you do?

The following program can generate an execution which TSAN regards as racy (the read and write to x at line 9 and 15). However, in any sequentially consistent execution, the read/write to x in line 9/15 cannot occur simultaneously.

package main

func main() {
    var x int
    ch := make(chan int, 2)

    go func(ch chan int) {
        if <-ch == 2 {   // line 8
            println(x)   // line 9
        }
    }(ch)

    ch <- 1              // line 13
    ch <- 2              // line 14
    x = 1                // line 15
    <-ch                 // line 16
    close(ch)
}

What did you see happen?

TSAN reports a race on two accesses to variable x, if we run this program with injected time delays using time.Sleep() function. The time-delay version of this program is below.

package main

import "time"

func main() {
    var x int
    ch := make(chan int, 2)

    go func(ch chan int) {
        time.Sleep(1 * time.Second) // 1s
        if <-ch == 2 {
            println(x)               // line 12
        }
    }(ch)

    ch <- 1                     // 0s
    ch <- 2                     // 0s
    x = 1                       // 0s   line 18
    <-ch                        // 0s   
    time.Sleep(2 * time.Second) 
    close(ch)                   // 2s
}

Running this program with go run -race test.go gives the following warning.

==================
WARNING: DATA RACE
Read at 0x00c0000140c8 by goroutine 6:
  main.main.func1()
      /home/shizheng/GoTests/email/mm3-time-day.go:12 +0x5d
  main.main.gowrap1()
      /home/shizheng/GoTests/email/mm3-time-day.go:14 +0x41

Previous write at 0x00c0000140c8 by main goroutine:
  main.main()
      /home/shizheng/GoTests/email/mm3-time-day.go:18 +0x1a4

Goroutine 6 (running) created at:
  main.main()
      /home/shizheng/GoTests/email/mm3-time-day.go:9 +0x165
==================
1
Found 1 data race(s)
exit status 66

The time-delayed program generates a sequential execution as following.

exec

What did you expect to see?

We expect this program to be not racy. Here is a short explanation.

We understand that the Go memory model defines an execution as racy if there are two conflicting events that are unordered by the Happens-Before (HB) relation. However, we are curious why the memory model doesn't define an execution as racy, if two conflicting events occur simultaneously in it. While these two definitions may be equivalent in shared memory communication, we discovered an example program (the program above) where a racy execution can occur according to the HB definition, but not under the alternative definition.

In fact, the original program is considered racy only with respect to the HB definition. This is because the read of x at line 9 can only occur if the receive operation at line 8 successfully receives the value 2. Given that the send operation at line 13 is ordered before line 14, there must be another receive operation that pops the value 1 out of the channel. Consequently, the receive operation at line 16 must have executed, which implies that the write at line 15 has also been executed. That is, in any executions where both the write and read to x occur, the write is always ordered before the read.

Additionally, we have a question regarding the HB edges defined for channels. Since Go appears to use mutexes/locks for channel operations, why does the Go memory model synchronize only accesses to the same slot of the channel buffer, rather than synchronizing all pairs of accesses on the same channel?

We would be grateful for any clarification you could provide on these matters. Thank you for your time and assistance!

gabyhelp commented 1 month ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

randall77 commented 1 month ago

In fact, the original program is considered racy only with respect to the HB definition. This is because the read of x at line 9 can only occur if the receive operation at line 8 successfully receives the value 2. Given that the send operation at line 13 is ordered before line 14, there must be another receive operation that pops the value 1 out of the channel. Consequently, the receive operation at line 16 must have executed, which implies that the write at line 15 has also been executed. That is, in any executions where both the write and read to x occur, the write is always ordered before the read.

This analysis requires understanding the values sent on the channel, and the value used in the conditional guarding the read of x. The race detector just isn't that sophisticated. It can only see send and receive events, not the values associated with those events. And it also does no control flow analysis - it doesn't understand anything about the conditionals it successfully got through to reach a given statement.

ianlancetaylor commented 1 month ago

For more background on why the memory model is written as it is, please see https://research.swtch.com/mm.

The memory model says:

If you must read the rest of this document to understand the behavior of your program, you are being too clever.

Don't be clever.

I'm going to close this because I don't think there is anything to do here. Questions and discussions are best held in a forum, not the issue tracker. See https://go.dev/wiki/Questions. Thanks.