doc: define how sync/atomic interacts with memory model

rsc commented 11 years ago

Neither golang.org/ref/mem nor golang.org/pkg/sync/atomic say anything about what
guarantees are made by the atomic operations wrt the memory model. They should be as
weak as possible, of course, but right now they are non-existence, which is a little too
weak.

We might say, for example, that an atomic.Store writing a value to a memory location
happens before an atomic.Load that reads that value from the memory location. Is that
something we want to say? If not, what do we want to say?

What about Add?

What about CompareAndSwap?

rsc commented 6 years ago

Quoting https://github.com/golang/go/issues/5045#issuecomment-252730563 above:

Yes, I spent a while on this last winter but didn't get a chance to write it up properly yet. The short version is that I'm fairly certain the rules will be that Go's atomics guarantee sequential consistency among the atomic variables (behave like C/C++'s seqconst atomics), and that you shouldn't mix atomic and non-atomic accesses for a given memory word.

It's been two years and I still haven't gotten a chance to write this up properly, but it's getting close to the top of my work queue. The two years of on-again, off-again discussions have essentially confirmed this approach. People already write code assuming basically these semantics - I can't imagine doing something weaker than this from the "what you can rely on" point of view. So I don't think anyone should be blocked on this.

fmstephe commented 5 years ago

At two recent go conferences I have heard very high quality talks on low level concurrency in Go.

One was about a specific lock free algorithm used by prometheus. The other looking into how Go's race detector works. Both of these talks were of a very high quality and the audience showed a strong interest in the topics presented.

This demonstrates that there is an eager audience for documenting the semantics of the atomic package.

It also clear that while low-level atomics are niche and difficult to use they are already being used in production. The prometheus library is used extensively in a lot of Go systems.

If I do a search for source code importing atomic in just the packages I have on my work laptop I get 60 hits, only 3 of those are in code we authored ourselves (and we don't use prometheus).

I only write this to provide anecdotal data which suggests that documenting the semantics of the atomic package is probably very valuable. I'm not sure if that's a helpful comment or not.

nhooyr commented 5 years ago

specific lock free algorithm used by prometheus

Would be interesting in watching this talk, do you know if there is a video anywhere?

jasonwatkinspdx commented 5 years ago

@nhooyr https://docs.google.com/presentation/d/1wuNNW-g6v8qizIc_IxAGZTj-49TODKF0TYddTA1VDUo/edit#slide=id.p

folays commented 5 years ago

Please take this statement with some grain of salt, since my knowledge is more theory than practice.

I think that the lock-free algorithm used in the Prometheus talk linked by @nhooyr may exhibits undefined behaviour (a data race) on non-x86 hardware, specifically ARM with multiple processors and goroutines.

https://en.wikipedia.org/wiki/Memory_ordering specifies that x86 (but not ARM) prevents reordering of loads/stores after atomic operation (sync/atomic Load/Store/Inc/etc...)

Linux seems to even provides semantics to issue atomic*{acquire, release}() ops : https://www.kernel.org/doc/Documentation/atomic_t.txt

I wouldn't surprised if in Linux the rationale was to provide users of atomic_* operations to not issue an (useless) memory barrier after atomic ops (useless only on platform where atomic op are already serializing)

Golang source code seems to not explicitly issue memory barrier instructions on ARM after atomic-Xadd, but my asm knowledge for ARM is zero. On x86, Golang seems to only issue "lock"-prefixed xaddl (without barriers, but locked instructions should already be synchronising on x86).

If there Is confirmation that ARM can reorder atomic with load/stores, and that Golang does not explicitly add memory barrier instructions on atomic ops, maybe the histogram.go file from Prometheus is indeed subject to data race on ARM.

This would be a hint that in the Go language are missing either 1) user-issuable memory barrier 2) atomic*{relaxed, acquire,release} instructions ?

Maybe those two missing from the language is for the better, since missuses can cause sporadic data race.

In all cases, maybe the Go documentation should explicit if atomic_*() ops are either : 1) undefined behaviour when used trans-goroutines, on the specific matter of happens-before relationships, i.e. "it depends on the platform" (x86 will work ; ARM will bug) 2) defined behaviour : "we explicitly issue memory barrier instructions in your back" (heavy impact ?) 3) defined behaviour if used with acquire/releases semantics (missing from Go)

On the specific subject of user-issuable memory barrier, I guess that for any specific goroutine, a lock+unlock (successive) of an useless atomic.Mutex, only accessible from the specific goroutine, uncontended, and alone on it's cache-line could maybe be a hacky way of issuing a full memory barrier in Go.

On another hand, I understand that the rationale behind Go is to not try to synchronize your coroutines yourself, and instead use sync and sync/atomic which may or may not define the happens-before relationship.

For the "histogram.go" files, I did not benchmarked alternatives, but I'm a little sceptical on the rationale behind the "we do not want a channel to synchronize tons of observations from multiple goroutines". To me it seems that if multiples coroutines have tons of observations, which they atomicadd() themselves (the proposed solution of the talk speaker), it would incur some heavy cache-line bouncing. I guess that atomic*() instructions, even wait-free, are not cheap if your CPU core is not already an owner of the cache-line and if there is heavy contention. I guess that only one goroutine in charge of those atomic_inc() could benefit from an uncontested cache-line on the histogram, and could receive all observations from one channel (or even multiple channels to prevent contention -if it can happens- on the unique channel)

For the part I call "how to get a snapshot of the histogram to Write() it elsewhere, without blocking all observations" I guess that the speaker of the talk could keep his technique of having two versions of the same data structure (a hot and a cold one), and synchronizing the swap (upon Write-to-elshewhere) and access (upon obervations) with a sync.RWMutex.

The Write-to-elsewhere would Lock()+swap+Unlock()
The observations would RLock()+atomic.Inc()+Unlock() And the atomic.Inc() could be downgraded to downgraded to a non-atomic one if there was only one goroutine.

On a personnal preference, I would like to have access to low-level semantics from Go (atomic operations, relaxed or not, and barriers) It would gives options to try and benchmark code using them compared to order-synchronisation primitives provided by Golang (somewhat only mutex + channels).

ianlancetaylor commented 5 years ago

@folays The purpose of this issue is to document exactly what the Go functions in the sync/atomic package do.

Golang source code seems to not explicitly issue memory barrier instructions on ARM after atomic-Xadd, but my asm knowledge for ARM is zero.

The implementation of Xadd on ARM can be found at https://golang.org/src/runtime/internal/atomic/atomic_arm.go#L41 . As you can see, it relies on Cas. Cas is implemented either by the Linux kernel, or by https://golang.org/src/runtime/internal/atomic/asm_arm.s#L7 . In both cases the appropriate DMB (data memory barrier) instruction is used.

On a personnal preference, I would like to have access to low-level semantics from Go (atomic operations, relaxed or not, and barriers)

That is a fine thing to discuss, but that discussion is not appropriate for this issue. Please use the golang-nuts group for that. Thanks.

zhangfannie commented 5 years ago

I am trying to do some optimizations using ARMv8.1 instructions but face the difficulty to choose right instructions due to missing memory ordering definition.

There might be some other issues caused by missing memory ordering definition. Potentially, it can cause misalignment between Go application developers and Go toolchain developers. And the behavior may be not portable among different architectures. For example:

// goroutine 1
atomic.Store(&X, 1)
r1 = Y
// goroutine 2
atomic.Store(&Y, 1)
r2 = X
// afterwards
if r1 == 0 && r2 == 0 {
  panic("broken")
}

Please refer to the runnable test code. https://github.com/zhangfannie/go/blob/testatomic/test.go Above code won’t panic on x86, but may panic on arm64. Without an explicit memory ordering definition, it is hard to tell whether the application code is wrong, Golang x86 is wrong or Golang arm64 is wrong. Can we put some explicit definition in the document? So that people can have unified understanding of the memory order. And it will be helpful for us to select the right instructions when implementing the Golang backend. If we think current Golang implementation is correct and satisfies most use scenarios, it might be reasonable that we just refer to the most relaxed behavior among different backends for each operations. So that we can have a memory order definition without breaking existing applications.

tv42 commented 5 years ago

@zhangfannie You have a race with f1 reading Y and vice versa; the code is wrong on all platforms, even if some platform might not manage to trigger the race.

robaho commented 5 years ago

r1 and r2 need to be read by atomic.load(). But with a more explicit memory model that might not be needed - depends on if the compiler understands the semantics of the atomic methods.

zhangfannie commented 5 years ago

@tv42 @robaho @rsc The data race in the example code id intended. Actually, this is a modified version of Deller pattern from https://github.com/golang/go/issues/5045#issuecomment-66076296. Here what I want to say is, we do not know whether this code is wrong or not without a clear order definition.

The code is wrong if atomic.Store follows store release semantic and atomic.Load follows load acquire semantic. The load should be changed to atomic.Load. The code can be correct if the memory order definition requires full barriers inserted before and after atomic.Store.

Having a memory model defined can always be better than guessing the behavior of the toolchain.

robaho commented 5 years ago

Actually, since the program must be sequentially consistent in the absence of threads, the program should never panic.

tv42 commented 5 years ago

@zhangfannie The only happens-before relationship between the last iteration r1[i] = Y[i] and main is through wg.Wait(). There are no atomic operations after that last (non-atomic) assignment. There is no happens-before relationship at all between the last r1[i] = Y[i] and any part of f2. Hence, f2 reading the last r1[i] is a race. This follows from current-day contents of https://golang.org/ref/mem . The comment you tried to link to uses atomic loads; your code does not.

robaho commented 5 years ago

Even with atomic loads there is a data race, in that the only thing you can reason is that the values will never be 0 (unless they wrap around)

gopherbot commented 5 years ago

Change https://golang.org/cl/185737 mentions this issue: [RFC]sync/atomic: specify the memory order guarantee provided by Load/Store

gopherbot commented 5 years ago

Change https://golang.org/cl/189417 mentions this issue: sync/atomic define sync/atomic memory models

eloff commented 4 years ago

Can we just add "Go's atomics guarantee sequential consistency among the atomic variables (behave like C/C++'s seqconst atomics). You shouldn't mix atomic and non-atomic accesses for a given memory word, unless some other full memory barrier, like a Mutex, guarantees exclusive access. You shouldn't mix atomics of different memory sizes for the same address."

I think it's bad that this trivial issue has been open for 6 years. At this point there's no way to specify anything more relaxed, even if that were desirable, without breaking code in the wild in hard to detect ways.

I'll sign the contributors agreement and figure out how to issue the pull request / change request for it, if somebody on the Go team will sign off on the final wording and commit to reviewing/merging it.

robaho commented 4 years ago

For atomics to useful I think they have to establish a “happens before” relationship.

On Nov 16, 2019, at 9:53 AM, Daniel Eloff notifications@github.com wrote:

Can we just add "Go's atomics guarantee sequential consistency among the atomic variables (behave like C/C++'s seqconst atomics). You shouldn't mix atomic and non-atomic accesses for a given memory word, unless some other full memory barrier, like a Mutex, guarantees exclusive access. You shouldn't mix atomics of different memory sizes for the same address."

I think it's bad that it's taken 6 years to specify it, and at this point there's no way to specify anything more relaxed, even if that were desirable, without breaking code in the wild in hard to detect ways.

I'll sign the contributors agreement and figure out how to issue the pull request / change request for it, if somebody on the Go team will sign off on the final wording and commit to reviewing/merging it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

zhiqiangxu commented 4 years ago

I agree with @eloff , we should let all users know that all exposed functions in sync/atomic guarantee sequential consistency, not only known in the go team(eg, https://github.com/golang/go/issues/32428#issuecomment-498719675 and https://stackoverflow.com/a/58892365/3382012), unless one day we decide to expose atomic.LoadAcq/atomic.StoreRel or alike , by that time we can add additional document for these functions though. It has been pending for 6 years, time to make a change:)

eloff commented 4 years ago

@robaho I'm referring to the C++ atomics happens-before ordering when talking about consistency. The docs can either point one to the C++ docs for the happens-before wording, or copy it.

Since there's no movement on this, I'm going to sign the contrib agreement and submit a pull-request.

bcmills commented 4 years ago

Here's a neat question that needs to be resolved: are programs allowed to use 32-bit atomic ops concurrently with 64-bit atomic ops on the same memory?

For example, is this program racy?

    var x uint64
    xa := (*[2]uint32)(unsafe.Pointer(&x))
    xl := &xa[0]
    xh := &xa[1]

    done := make(chan struct{})
    go func() {
        atomic.StoreUint64(&x, 0xbadc0ffee)
        close(done)
    }()

    x0 := atomic.LoadUint32(xl)
    x1 := atomic.LoadUint32(xh)

    <-done

My instinct is that this sort of size-mixing must not be allowed, because the implementations for 32-bit and 64-bit atomic ops may differ on some 32-bit hardware. However, the race detector does not currently flag it.

yangwenmai commented 4 years ago

specific lock free algorithm used by prometheus

Would be interesting in watching this talk, do you know if there is a video anywhere?

@nhooyr GopherCon UK 2019: Björn Rabenstein - Lock-free Observations for Prometheus Histograms

https://youtu.be/VmrEG-3bWyM

leventov commented 3 years ago

Currently, the usage examples of atomic.Value are unsound without defining the happens-before relationships between Store() and Load().

chrisprobst commented 3 years ago

My team just learned about this issue and we are using atomics for years. Are there at least any guarantees or is using atomics always undefined behavior in Go?

ianlancetaylor commented 3 years ago

Pedantically, when talking about things like memory models "undefined behavior" means that any random bug can happen (https://en.wikipedia.org/wiki/Undefined_behavior). That isn't the case here. Go is always going to do some approximation of the right thing.

There has been a lot of discussion on this issue. The general goal here remains https://github.com/golang/go/issues/5045#issuecomment-252730563 . But the precise details and wording remain open.

ghost commented 2 years ago

Can we just add "Go's atomics guarantee sequential consistency among the atomic variables (behave like C/C++'s seqconst atomics). You shouldn't mix atomic and non-atomic accesses for a given memory word, unless some other full memory barrier, like a Mutex, guarantees exclusive access. You shouldn't mix atomics of different memory sizes for the same address."

This is merely anecdote, but in my experience (full-time go developer >5y), the users of atomics assume sequential consistent ordering. I sure know I did!

btiernay commented 2 years ago

Any movement on this issue in the last year, or is this now being covered by https://github.com/golang/go/discussions/47141?

ianlancetaylor commented 2 years ago

I think this is now covered by #50859.

ianlancetaylor commented 2 years ago

This was done by https://go.dev/cl/381315.

robaho commented 2 years ago

Woo Hoo ! Just sneaked in under the 10 year mark!

golang / go

doc: define how sync/atomic interacts with memory model #5045