golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.97k stars 17.66k forks source link

sync/atomic: add Uint64Pair #61236

Open haraldrudell opened 1 year ago

haraldrudell commented 1 year ago

I suggest CAS2 is implemented in the atomic package similar to atomic.align64, operational for hardware that supports it.

possibly also:

darwin-arm64 darwin-amd64 linux-amd64

I think also atomic generics could do with another polish Like this AtomicMax that is more flexible in what underlying integer types and named types can be used: https://github.com/haraldrudell/parl/blob/main/atomic-max.go

ianlancetaylor commented 1 year ago

I think the proposal here is

// CompareAndSwapUint128 executes the compare-and-swap operation for a 128-bit
// integer, represented as a pair of 64-bit integers.
func CompareAndSwapUint128(addr *uint64, old1, old2, new1, new2 uint64) (swapped bool)

But see also #9455. If we adopt that, then this version of the function would no longer make sense. So perhaps the proposal here should be

// CompareAndSwapUint64Pair executes the compare-and-swap operation for a 128-bit
// integer, represented as a pair of 64-bit integers.
func CompareAndSwapUint64Pair(addr *uint64, old1, old2, new1, new2 uint64) (swapped bool)

Presumably hardware does not have a 128-bit CAS operation could implement this using something like the lock table in runtime/internal/atomic/atomic_arm.go.

randall77 commented 1 year ago

Perhaps

func CompareAndSwapUint64Pair(addr *[2]uint64, old, new [2]uint64) (swapped bool)

On x86 at least, the hardware instructionCMPXCHG16B requires its argument to be 16-byte aligned. I'm not sure how we would enforce that. It sounds like #599 all over again.

merykitty commented 1 year ago

We could have atomic.Uint64Pair which can be enforced specially by the compiler to have 16-byte alignment. An API point to operate on arbitrary pairs of uint64 seems improbable.

haraldrudell commented 1 year ago
  1. CAS2 atomic.CompareAndSwap is one required 128-bit instruction for wait-Free 2016+
  2. The other is fetch-and-add FAA atomic.Add
  3. LL/SC is only used if the other 2 do not exist and is actually more complicated, likely for some unusual platforms that are not arm64 or amd64
  4. If there is none of that, it’s slow-path

There is a Go package that uses array of uint64, [2]uint64 in its api: github.com/CAFxX/atomic128

Generally, the data used is not a 128-bit number but a composite of often two 64-bit numbers that are changed atomically as a pair

CAFxX commented 1 year ago

There is a Go package that uses array of uint64, [2]uint64 in its api: https://github.com/CAFxX/atomic128

FTR this was just out of necessity (due to the lack of uint128) more than anything else. It's true that both intel and arm define their DWCAS instructions as operating on pairs of 64-bit values, but that does not automatically mean we should expose that in our APIs (especially given the alignment requirements that are uncharacteristic of uint64s)

rsc commented 1 year ago

Which systems would support this function?

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

CAFxX commented 1 year ago

Which systems would support this function?

Natively: I'm aware of at least amd64 with CMPXCHG16B (AMD64v2 and above) and aarch64/ARMv8.1 with LSE

Others will have to be emulated (I just use a mutex in my library in this case, but other implementations are possible e.g. for aarch64 there are DW LL/SC instructions that were available before the native DWCAS)

rsc commented 1 year ago

We don't want to have an API that's unavailable or always fails on 32-bit systems, so what do we do on 32-bit systems? If we use a lock on 32-bit systems then we need to provide ways for code to read the 64-bit halves or the whole 128-bit value. (Currently there's no way to find out what the value has!)

Perhaps the right path forward is to define a type wrapper like atomic.Uint64 and not the un-wrapped one.

package atomic
type Uint64Pair struct { ... unexported ... }
func (x *Uint64Pair) Load() [2]uint64
func (x *Uint64Pair) Store(new [2]uint64) 
func (x *Uint64Pair) CompareAndSwap(old, new [2]uint64) bool
func (x *Uint64Pair) Swap(new [2]uint64) (old [2]uint64)

But there would not be corresponding top-level functions, to avoid direct access to the atomic uint64 values.

Edit: Added Swap.

someview commented 1 year ago

How about add something like MarkableReference and TimeStampedReference in java,just extend the val :

func CompareAndSwapUint64Pointer(oldPtr unsafe.Pointer, newPtr unsafe.Pointer, new1, new2 uint64) (swapped bool)

This is useful for lock free data structure: hashmap.add

rsc commented 1 year ago

@someview, I am not sure I understand what you mean by "just extend the val" here. It sounds like you mean assume that oldPtr and newPtr both point to a pair of uint64 values, but if we do that, then there will be a problem on 32-bit systems (which must use locks to implement cas128) that the individual uint64s will be available for ordinary reads or even atomic.LoadUint64, neither of which will be correct since they will not use the lock too. The point of the Uint64Pair type is to hide the actual uint64 values behind an abstraction that only allows using them in this specific way.

someview commented 1 year ago

If cas128 must support

@someview, I am not sure I understand what you mean by "just extend the val" here. It sounds like you mean assume that oldPtr and newPtr both point to a pair of uint64 values, but if we do that, then there will be a problem on 32-bit systems (which must use locks to implement cas128) that the individual uint64s will be available for ordinary reads or even atomic.LoadUint64, neither of which will be correct since they will not use the lock too. The point of the Uint64Pair type is to hide the actual uint64 values behind an abstraction that only allows using them in this specific way.

This may be double-word CAS,not atomic128 with 32-bit system

rsc commented 1 year ago

It sounds like people are generally happy with https://github.com/golang/go/issues/61236#issuecomment-1681044578. Have all remaining concerns about this proposal been addressed?

randall77 commented 1 year ago

We probably want Swap also, the other types have that.

rsc commented 1 year ago

Added Swap, thanks.

CAFxX commented 1 year ago

I suspect quite a few uses of a DWCAS would be to deal with pointers... Should we consider also PointerPair/[2]unsafe.Pointer in this proposal (with the caveat that the size will differ between 32 and 64 bit archs), or should we leave it for later?

ianlancetaylor commented 1 year ago

I suggest that PointerPair be a different proposal, since, as you say, it's not necessarily a 128-bit issue. People might in principle be running into it today on 32-bit systems. That separate proposal should ideally point out some algorithms that would benefit from a PointerPair implementation. Thanks.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

zephyrtronium commented 1 year ago

Somehow I missed this until now. To echo @CAFxX, I'm not really sure what algorithms can actually make use of uint64 pairs in particular, i.e. without needing any pointers in the pair. The classical wait-free linked list algorithm using atomic pairs needs one or two pointers, depending on the element type. wCQ seems to need zero or one depending on the element type. wFQ doesn't actually seem to use 128-bit atomics, unless I missed it in my skim, and if it would then at least one would be a pointer.

@haraldrudell's opening post says, "what is desired is a performant wait-free queue and wait-free free-list stack." That doesn't seem to be possible under unsafe.Pointer rules with just integer operations, barring implementing a whole manual allocator to circumvent Go's garbage collector.

So, it seems like the only potential use case of this is a fixed-buffer wait-free queue with uint64 elements. That seems dubious to me. What am I missing?

zephyrtronium commented 1 year ago

I have been thinking more about this. Is there any reason not to make the API generic?

package atomic
type Pair[Fst ~*FstElem | ~uintptr, Snd ~*SndElem | ~uintptr, FstElem, SndElem any] struct { ... }
func (*Pair[Fst, Snd, _, _]) Load() (Fst, Snd)
func (*Pair[Fst, Snd, _, _]) Store(Fst, Snd) 
func (*Pair[Fst, Snd, _, _]) CompareAndSwap(oldFst Fst, oldSnd Snd, newFst Fst, newSnd Snd) bool
func (*Pair[Fst, Snd, _, _]) Swap(Fst, Snd) (Fst, Snd)

Per https://github.com/golang/go/issues/61236#issuecomment-1701843267, I'm happy to open a separate proposal for this since it is not a 128-bit issue per se. I'm posting here first because I don't think the current proposal meets its own goals, whereas this does. The list of algorithms that would benefit is the same (unless someone has an example otherwise); it would only make the implementations of those algorithms much simpler and safer as well as being the same on 32-bit platforms as on 64-bit, except where uint64 elements in particular would appear.

rsc commented 1 year ago

The generic version seems untenable to me. How do I put one of those in a struct?

type MyType struct {
    Field atomic.Pair[*byte, *int, byte, int]
}

is pretty awkward. And what if one type is uintptr?

    Field atomic.Pair[uintptr, *int, ????, int]

What goes in the ????? I guess we can provide anything but then explaining why 'any' appears here:

    Field atomic.Pair[uintptr, *int, any, int]

is very awkward.

We don't have answers for these questions in the current generics, nor in the future plans.

If we don't try to generify, it seems like we need three types: Uint64Pair, PointerPair[T1, T2], and Uint64AndPointer[T]. That's not ideal, but it works, and this is sync/atomic, which is never that pretty.

CAFxX commented 1 year ago

Uint64AndPointer[T]

This should maybe be UintptrAndPointer[T], considering 32 bit architectures.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 1 year ago

Right now it looks like the proposal is to add three types with this API:

package atomic
type Uint64Pair struct { ... unexported ... }
func (x *Uint64Pair) Load() (uint64, uint64)
func (x *Uint64Pair) Store(uint64, uint64) 
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

type PointerPair[T1, T2 any] struct { ... unexported ... }
func (x *PointerPair[T1, T2]) Load() (*T1, *T2)
func (x *PointerPair[T1, T2]) Store(new1 *T1, new2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

type Uint64AndPointer[T1 any] struct { ... unexported ... }
func (x *Uint64AndPointer[T1]) Load() (uint64, *T1)
func (x *Uint64AndPointer[T1]) Store(uint64, *T1)
func (x *Uint64AndPointer[T1]) CompareAndSwap(old1 uint64, old2 *T1, new1 uint64, new2 *T2) bool
func (x *Uint64AndPointer[T1]) Swap(new1 uint64, new2 *T1) (old1 uint64, old2 *T1)

And maybe also UintptrAndPointer.

The arrays are gone because they only apply in Uint64Pair.

It's not clear if this is the road we want to go down, but I don't see another one.

rsc commented 1 year ago

This is the API we think we'd use. Is there anyone who has a use case that isn't addressed by this? And is there anyone who has a use case that is addressed by this? (We want to make sure we're not adding something no one needs.) Thanks!

zephyrtronium commented 1 year ago

PointerPair with CAS makes it straightforward to implement an efficient lock-free linked list with arbitrary elements, which directly addresses a use case I have. Uint64AndPointer does the same for uint64 elements, which is probably still useful. It's a bit of a shame that having them as separate types means that the same type can't provide both varieties of linked list, but I'm not sure how common the latter is anyway.

If @haraldrudell or someone else could check that my reading of wCQ in https://github.com/golang/go/issues/61236#issuecomment-1710637923 is correct, that would confirm another use case for Uint64AndPointer.

rsc commented 1 year ago

Have all remaining concerns about this proposal been addressed?

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)
rsc commented 1 year ago

Note that like with atomic.Int64, the compiler and toolchain will provide the necessary alignment automatically, so there's no need to worry about it or mention it here.

rsc commented 1 year ago

This seems like a likely accept but we may well not get to it until Go 1.23 at this point.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)
rsc commented 12 months ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

Add to sync/atomic:

package atomic

// A Uint64Pair is an atomic pair of uint64 values.
// The zero value is a pair of zeros.
type Uint64Pair struct { ... unexported ... }

// Load atomically loads and returns the pair stored in x.
func (x *Uint64Pair) Load() (v1, v2 uint64)

// Store atomically stores the pair v1, v2 into x.
func (x *Uint64Pair) Store(v1, v2 uint64) 

// CompareAndSwap executes the compare-and-swap operation for x.
func (x *Uint64Pair) CompareAndSwap(old1, old2, new1, new2 uint64) bool

// Swap atomically stores new1, new2 into x and returns the old pair.
func (x *Uint64Pair) Swap(new1, new2 uint64) (old1, old2 uint64)

// A PointerPair is an atomic pair of pointer values.
// The zero value is a pair of nils.
type PointerPair[T1, T2 any] struct { ... unexported ... }

// Same doc comments as above.
func (x *PointerPair[T1, T2]) Load() (v1 *T1, v2 *T2)
func (x *PointerPair[T1, T2]) Store(v1 *T1, v2 *T2) 
func (x *PointerPair[T1, T2]) CompareAndSwap(old1 *T1, old2 *T2, new1 *T1, new2 *T2) bool
func (x *PointerPair[T1, T2]) Swap(new1 *T1, new2 *T2) (old1 *T1, old2 *T2)

// A Uint64AndPointer is an atomic pair consisting of a uint64 and a pointer.
// The zero value is a zero uint64 and a nil pointer.
type Uint64AndPointer[T any] struct { ... unexported ... }

// Same doc comments as above.
func (x *Uint64AndPointer[T]) Load() (v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) Store(v1 uint64, v2 *T)
func (x *Uint64AndPointer[T]) CompareAndSwap(old1 uint64, old2 *T, new1 uint64, new2 *T) bool
func (x *Uint64AndPointer[T]) Swap(new1 uint64, new2 *T) (old1 uint64, old2 *T)
Zheng-Xu commented 12 months ago

In https://pkg.go.dev/sync/atomic#pkg-note-BUG , it is said

On ARM, 386, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

So far, we don't have 16-byte aligned data structure in Go. Will we provide the capability to cast something else to Uint64Pair, for example the address of a struct to Uint64Pair?

More specifically, can we guarantee below Load() is atomic?

var comp complex128
(*atomic. Uint64Pair)(unsafe.Pointer(&comp)).Load()
cherrymui commented 11 months ago

@Zheng-Xu as commented in https://github.com/golang/go/issues/61236#issuecomment-1789380904 , atomic. Uint64Pair will be properly aligned, to 16-byte if the architecture's atomic operations require it. That said, the alignment of complex128 is not changed, so that unsafe conversion is not safe.

mauri870 commented 11 months ago

I would be happy to work on this for the 1.23 cycle, given that I have previous experience implementing internal and sync/atomic APIs. I'll keep it assigned unless someone else is interested.