Proposal: Atomic<T> (corefx)

benaadams commented 7 years ago

Atomic

Background

x64 has been able to do 128bit/16 byte Interlocked CompareExchanges for a while and its a cpu requirement to be able install Window 8.1 x64 and Windows 10 x64. Its also available on MacOS/OSX and Linux. Its availability would allow for more interesting lock-less algorithms.

Was trying to think if there was an easy fallback for https://github.com/dotnet/corefx/issues/10478 but since its struct-based there is nothing to lock.

So additionally it would be good to have a type Atomic which mirrors the C++ 11 atomic type, adds some .NET magic and can fallback to locks when the data width is not supported on the platform.

Proposed Api

Interface, struct, class, extenstions (and some reference manipulation types)

namespace System.Threading.Atomics
{
    public interface IAtomic<T>  where T : IEquatable<T>
    {
        T Read();
        T Read(MemoryOrder order);
        void Write(T value);
        void Write(T value, MemoryOrder order);

        T Exchange(T value);
        T Exchange(T value, MemoryOrder order);

        bool CompareExchangeWeak(T value, T comperand);
        bool CompareExchangeWeak(T value, T comperand, MemoryOrder order);

        bool CompareExchangeStrong(T value, T comperand);
        bool CompareExchangeStrong(T value, T comperand, MemoryOrder order);

        // MemoryOrder variants skipped for brevity

        // Unsafe transformations, bypass the atomicity
        T UnsafeTransform(AtomicTransform<T> transformation);
        T UnsafeTransform(AtomicTransformParam<T> transformation, T val);

        // Atomic transformations, Func should be pure and repeatable in case of retry

        // Pure transform
        T Transform(Func<T, T> transformation);
        T Transform<TState>(Func<T, TState, T> transformation, TState state);

        // Conditional transforms e.g. Increment but only while < N
        T Transform(Func<T, T> transformation, Func<T, bool> condition);
        T Transform<TState>(Func<T, T> transformation, Func<T, TState, bool> condition, TState state);

        // Same data transform, apply if value is what is expected
        T Transform(Func<T, T> transformation, T comperand);
        T Transform<TState>(Func<T, TState, T> transformation, TState state, T comperand);
    }

    public delegate T AtomicTransform<T>(ref T input);
    public delegate T AtomicTransformParam<T>(ref T input, T val);

    public enum MemoryOrder
    {
        Relaxed,
        Consume,
        Acquire,
        Release,
        AcquireRelease,
        SequentiallyConsistent
    }
}

Atomic struct

[Disallow(Stack,Boxing)]
public struct ValueAtomic<T> : IAtomic<T> where T : IEquatable<T>
{
    private T _data;
    // allocated if not supported lock-free natively
    private object _lock;

    [JitIntrinsic]
    public static bool IsLockFree { get; }

    public ValueAtomic(T value)
    {
        _data = value;
        _lock = IsLockFree ? null : new object();
    }

    public T Read();
    public T Read(MemoryOrder order);
    public void Write(T value);
    public void Write(T value, MemoryOrder order);
    public T Exchange(T value);
    public T Exchange(T value, MemoryOrder order);
    public bool CompareExchangeWeak(T value, T comperand);
    public bool CompareExchangeWeak(T value, T comperand, MemoryOrder order);
    public bool CompareExchangeStrong(T value, T comperand);
    public bool CompareExchangeStrong(T value, T comperand, MemoryOrder order);
    public unsafe T UnsafeTransform(AtomicTransform<T> transformation)
        => transformation(ref _data);
    public unsafe T UnsafeTransform(AtomicTransformParam<T> transformation, T val)
        => transformation(ref _data, val);
    public T Transform(Func<T, T> transformation);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state);
    public T Transform(Func<T, T> transformation, Func<T, bool> condition);
    public T Transform<TState>(Func<T, T> transformation, Func<T, TState, bool> condition, TState state)
    {
        //T current = Read();
        //while (condition(current, state))
        //{
        //    T next = transformation(current);
        //    T prev = Interlocked.CompareExchange(ref _data, next, current);
        //    if (prev.Equals(current))
        //    {
        //        return next;
        //    }
        //    current = prev;
        //}
    }
    public T Transform(Func<T, T> transformation, T comperand);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state, T comperand);

    public static implicit operator T(ValueAtomic<T> atom) => atom.Read();
}

Atomic class (struct wrapper)

public class Atomic<T> : IAtomic<T> where T : IEquatable<T>
{
    private ValueAtomic<T> _atom;
    public static bool IsLockFree => ValueAtomic<T>.IsLockFree;

    Atomic(T value)
    {
        _atom = new ValueAtomic<T>(value);
    }

    public T Read()
        => _atom.Read();
    public T Read(MemoryOrder order)
        => _atom.Read(order);
    public void Write(T value)
        => _atom.Write(value);
    public void Write(T value, MemoryOrder order)
        => _atom.Write(value, order);
    public T Exchange(T value)
        => _atom.Exchange(value);
    public T Exchange(T value, MemoryOrder order)
        => _atom.Exchange(value, order);
    public bool CompareExchangeWeak(T value, T comperand)
        => _atom.CompareExchangeWeak(value, comperand);
    public bool CompareExchangeWeak(T value, T comperand, MemoryOrder order)
        => _atom.CompareExchangeWeak(value, comperand, order);
    public bool CompareExchangeStrong(T value, T comperand)
        => _atom.CompareExchangeStrong(value, comperand);
    public bool CompareExchangeStrong(T value, T comperand, MemoryOrder order)
        => _atom.CompareExchangeStrong(value, comperand, order);
    public unsafe T UnsafeTransform(AtomicTransform<T> transformation)
        => _atom.UnsafeTransform(transformation);
    public unsafe T UnsafeTransform(AtomicTransformParam<T> transformation, T val)
        => _atom.UnsafeTransform(transformation, val);
    public T Transform(Func<T, T> transformation)
        => _atom.Transform(transformation);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state) 
    => _atom.Transform(transformation, state);
    public T Transform(Func<T, T> transformation, Func<T, bool> condition)
        => _atom.Transform(transformation, condition);
    public T Transform<TState>(Func<T, T> transformation, Func<T, TState, bool> condition, TState state)
        => _atom.Transform(transformation, condition, state);
    public T Transform(Func<T, T> transformation, T comperand)
        => _atom.Transform(transformation, comperand);
    public T Transform<TState>(Func<T, TState, T> transformation, TState state, T comperand) 
        => _atom.Transform(transformation, state, comperand);

    public static implicit operator T(Atomic<T> atom) => atom.Read();
}

Numeric Extensions

public static class AtomicNumericExtensions
{
    // For byte, short, ushort, uint, int, long, ulong, single, double

    public static int Add(this Atomic<int> atom, int value);
    public static int Subtract(this Atomic<int> atom, int value);
    public static int Multiply(this Atomic<int> atom, int value);
    public static int Divide(this Atomic<int> atom, int value);

    // For byte, short, ushort, uint, int, long, ulong

    public static int Increment(this Atomic<int> atom);
    public static int Increment(this Atomic<int> atom, int max);

    public static int Decrement(this Atomic<int> atom);
    public static int Decrement(this Atomic<int> atom, int min);

    public static int And(this Atomic<int> atom, int value);
    public static int Or(this Atomic<int> atom, int value);
    public static int Xor(this Atomic<int> atom, int value);
    public static int Not(this Atomic<int> atom);
}

Bool Extensions

public static class AtomicBoolExtensions
{
    public static bool TestAndSet(this Atomic<bool> atom);
    public static bool Clear(this Atomic<bool> atom);
}

Atomic Flagged References

public struct FlaggedReference<TRef>
    where TRef : class
{
    TRef Reference { get; set; }
    bool Flag { get; set; }
}

public static class AtomicFlaggedReferenceExtensions
{
    public static bool TestAndSet<TRef>(this Atomic<FlaggedReference<TRef>> atom);
    public static bool TestAndSet<TRef>(
                        this Atomic<FlaggedReference<TRef>> atom,
                        TRef expectedReference);
    public static bool Clear<TRef>(this Atomic<FlaggedReference<TRef>> atom);
    public static bool Clear<TRef>(
                        this Atomic<FlaggedReference<TRef>> atom,
                        TRef expectedReference);
}

Atomic Versioned References

public struct VersionedReference<TRef>
    : IEquatable<VersionedReference<TRef>>
    where TRef : class
{
    TRef Reference { get; set; }
    long Version { get; set; }

    public bool Equals(VersionedReference<TRef> other)
        => ReferenceEquals(Reference, other.Reference) 
            && Version == other.Version;

    public static implicit operator TRef(VersionedReference<TRef> atom) => atom.Reference;
}

public static class AtomicVersionedReferenceExtensions
{
    public static VersionedReference<TRef> Increment<TRef>(
                    this Atomic<VersionedReference<TRef>> atom)
                    where TRef : class;
    public static VersionedReference<TRef> Increment<TRef>(
                    this Atomic<VersionedReference<TRef>> atom,
                    TRef expectedReference)
                    where TRef : class;
    public static VersionedReference<TRef> UpdateIncrement<TRef>(
                    this Atomic<VersionedReference<TRef>> atom,
                    VersionedReference<TRef> newRefExpectedVersion)
                    where TRef : class;
}

Atomic Tagged References

public struct TaggedReference<TRef, TTag>
    where TRef : class 
    where TTag : struct
{
    TRef Reference { get; set; }
    TTag Tag { get; set; }
}

public static class AtomicTaggedReferenceExtensions
{
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TaggedReference<TRef, TTag> newTaggedReference,
                    TRef expectedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TRef newReference,
                    TRef expectedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TTag newTag,
                    TRef expectedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TaggedReference<TRef, TTag> newTaggedReference,
                    TTag expectedTag);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TRef newReference,
                    TTag expectedTag);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TTag newTag,
                    TTag expectedTag);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TRef newReference,
                    TaggedReference<TRef, TTag> expectedTaggedReference);
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TTag newTag,
                    TaggedReference<TRef, TTag> expectedTaggedReference);
    // essentially CompareExchange
    public static TaggedReference<TRef, TTag> Update<TRef, TTag>(
                    this TaggedReference<TRef, TTag> atom,
                    TaggedReference<TRef, TTag> newTaggedReference,
                    TaggedReference<TRef, TTag> expectedTaggedReference);
}

Atomic Double Reference

public struct DoubleReference<TRef> : IEquatable<DoubleReference<TRef>>
    where TRef : class
{
    TRef ReferenceLeft { get; set; }
    TRef ReferenceRight { get; set; }

    public bool Equals(DoubleReference<TRef> other)
        => ReferenceEquals(ReferenceLeft, other.ReferenceLeft) &&
            ReferenceEquals(ReferenceRight, other.ReferenceRight);
}

public static class DoubleReferenceExtensions
{
    public static DoubleReference<TRef> ExchangeLeft<TRef>(
                    this Atomic<DoubleReference<TRef>> atom,
                    TRef expectedRight) where TRef : class;
    public static DoubleReference<TRef> ExchangeRight<TRef>(
                    this Atomic<DoubleReference<TRef>> atom,
                    TRef expectedLeft) where TRef : class;
}

Transforms

The transforms give the flexibility to compose more complex atomic data structures; for example the int Add, Increment and Increment to limit can be implemented as

public static int Add<TAtomic>(this TAtomic atom, int value) 
    where TAtomic : IAtomic<int>
{
    return atom.UnsafeTransform(
        (ref int current, int inc) 
            => Interlocked.Add(ref current, inc), value);
}

public static int Increment<TAtomic>(this TAtomic atom) 
    where TAtomic : IAtomic<int>
{
    return atom.UnsafeTransform((ref int v) => Interlocked.Increment(ref v));
}

public static int Increment<TAtomic>(this TAtomic atom, int value, int max) 
    where TAtomic : IAtomic<int>
{
    return atom.Transform((current) => current + 1, (c, m) => c <= m, max);
}

Or an entirely new type that hadn't previously supported atomic updates, with new operations

public static Decimal Multiply<TAtomic>(this TAtomic atom, Decimal value)
   where TAtomic : IAtomic<Decimal>
{
    return atom.Transform((current, m) => current * m, value);
}

When the struct was 16 bytes _data would need to be 16 byte aligned to allow lock free use with LOCK CMPXCHG16b where available. Might be easier to enforce alignment than with https://github.com/dotnet/corefx/issues/10478?

VersionedReference and FlaggedReference should be 16 byte aligned in Atomic (don't need to be outside), as should TaggedReference when the struct is <= 16 bytes.

davidfowl commented 7 years ago

https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html

benaadams commented 7 years ago

Maybe a CompareExchange variant that takes a Func<T, T>?

bool Transform<TVal>(Func<TVal, TVal> transformation, T comperand) where TVal : struct, T;

Added

joshfree commented 7 years ago

cc @sergiy-k

clrjunkie commented 7 years ago

I think in this case it’s ok skip the interface name suffix convention so things don’t read weird.

Consider changing ValueAtomic to AtomicValue.

Furthermore, the api really abstracts a memory location consider renaming to AtomicVariable (I think atomic value generally means something that can’t be further divided)

Consider adding 'Extensions' suffix to Extension Method classes.

benaadams commented 7 years ago

I think in this case it’s ok skip the interface name suffix convention so things don’t read weird. Consider changing ValueAtomic to AtomicValue.

There are 3 types, IAtomic, Atomic (class) and ValueAtomic (struct). The ValueAtomic follows the naming of ValueTuple and ValueTask for the struct based versions of Tuple and Task.

IAtomic is so generic constraints can be used for the two types in the extension methods.

Ideally I'd like ValueTask to be a heap+ref return only (array of ValueTask) struct. It would have a similar problem to SpinLock in stack space (e.g. no purpose and error though copying).

the api really abstracts a memory location

It can represent any type at any size; just will fallback to a lock when it can't be covered by a CAS or equivalent (e.g. TSX/LLSC).

Consider adding 'Extensions' suffix to Extension Method classes.

Done

GSPP commented 7 years ago

An alternative design would be to add the following:

Add all these methods in the form of static methods.
Support specifying memory alignment for structs.

That's more flexible than demanding that a certain type must be used.

benaadams commented 7 years ago

Add all these methods in the form of static methods ... That's more flexible than demanding that a certain type must be used.

Doesn't allow fallbacks; if I had a 16 byte struct it may support CAS on x64, but not x86, a 64 byte struct may support an atomic update with TSX on Intel Skylake but not on earlier cpus, AMD or ARM.

As a user I just want an atomically updating data structure without worrying how its done; whether CAS, LLSC, Transactional memory etc.

As a power user I may want to know if IsLockFree and use a different strategy if not.

With the Atomic type set; I could define a 256 byte struct and have it happily be atomically updated by just enclosing it in the type.

Atomic<struct256byte> LargeStruct = new Atomic<struct256byte>();
var val = LargeStruct.Read();
val.x = 15;
val.name = "www";
LargeStruct.Write(val);

An if that became an lock-free operation at some point on a platform the framework could be updated to take advantage of it and no user code would need to change.

clrjunkie commented 7 years ago

The ValueAtomic follows the naming of ValueTuple and ValueTask for the struct based versions of Tuple and Task.

Why? How are these related to this api?

It can represent any type at any size; just will fallback to a lock when it can't be covered by a CAS or equivalent (e.g. TSX/LLSC).

So it abstracts a location in memory.. seems natural to call it as such (e.g AtomicVariable)

p.s

Not planning to over argue about it... food for your thought

benaadams commented 7 years ago

The ValueAtomic follows the naming of ValueTuple and ValueTask for the struct based versions of Tuple and Task.

Why? How are these related to this api?

As there are two types:

Atomic<decimal> objectAtomicDecimal = new Atomic<decimal>();
ValueAtomic<decimal> valueTypeAtomicDecimal = new ValueAtomic<decimal>();

The class type (heap) to be passed as parameter; the valuetype to be embedded in class (heap) or used in arrays (heap).

clrjunkie commented 7 years ago

Aha... didn't pay quite attention to the Atomic class (wrapper)

clrjunkie commented 7 years ago

@benaadams

BTW.. I don't recall encountering code where an atomic variable was passed around as an argument for anything other then passing it to an interlocked method... even while reading through java code I only recall noticing their atomic abstraction used at the field level. Have you encountered any usage that justifies the 'class version'? Might worth posting an example here to be included in the api doc's as "sample usage". Genuinely interested.

GSPP commented 7 years ago

There's no need for a copy of the struct to put it on the heap. The framework has Box<T> for putting anything on the heap.

benaadams commented 7 years ago

@clrjunkie Unless it could be enforced that a stack copy of the ValueAtomic couldn't be taken, I'd see Atomic as the go to type; that a user would use first due to the simpler name and it would carry less coding risk.

Say you had an atomic subtract extension for decimal which took a minimum value:

static ValueTuple<bool, decimal> Subtract<TAtomic>(this TAtomic atom,
    decimal value,
    decimal min)
    where TAtomic : IAtomic<decimal>;

Then you took a function local copy:

class Account
{
    ValueAtomic<decimal> _balance = new ValueAtomic<decimal>();

    public bool Withdraw(decimal amount, out string message)
    {
        var balance = _balance; // uh oh - different copy

        var result = balance.Subtract(amount, 0);

        var success = result.Item1;
        var currentBalance = result.Item2;

        if (success)
        {
            message = $"{amount} was withrawn from your account and you have {currentBalance} remaing";
        }
        else
        {
            message = $"You only have {currentBalance} which is insufficent to withrawn {amount}"
                        + $"as that would leave you with {currentBalance - amount}";
        }

        return success;
    }
}

You may be confused why the actual _balance had not changed. If you were using the class version the results would be as expected.

So I'd prefer the struct version; but the class version comes with less hazards - thus offer both.

even while reading through java code I only recall noticing their atomic abstraction used at the field level

Does Java have value types? Everything is mostly an object type?

benaadams commented 7 years ago

The framework has Box<T> for putting anything on the heap.

Box<Atomic<decimal>> balance = new Box<Atomic<decimal>>(1000);
// ...
balance.Value.Subtract(10, 0);

Is just unhappy code... So you'd probably go:

Box<Atomic<decimal>> objectBalance = new Box<Atomic<decimal>>(1000);
// ...
var balance = objectBalance.Value;
balance.Subtract(10, 0);

And now you are in trouble for the reason mentioned in previous comment as you are operating on a different instance.

tenor commented 7 years ago

I did something similar with InterlockedBoolean which is a great convenience structure for storing atomic boolean values.

However, it can't be passed around since it's an ordinary struct. Perhaps the compiler can issue an error/warning if an instance of the proposed value is passed around.

I also think naming these types InterlockedValue and InterlockedObject are more congruent with .NET's InterlockedXXX naming style.

clrjunkie commented 7 years ago

@benaadams

Actually I started to think about it the other way around: go with class, drop struct, exactly because of boxing penalty.. and I don't see the need for a huge array of atomics and the boxing conversion penalty of both IAtomic where T : IEquatable can have a negative impact overall.

And why would anyone do this:

var balance = _balance; // uh oh - different copy

and not work with the _balance directly to extract the wrapped value via the Read method?

You mentioned that the motivation for the class version was for passing it as parameter, and my understanding was that your intention was to reduce the copy size. Java currently has only reference types and my point was that I didn't see those passed around anyway so I didn't understand the motivation compared to a field allocated struct.

benaadams commented 7 years ago

@clrjunkie people take different approaches to things, and its to leave the flexibility open. Depends whether you are taking an object orientated approach and adding 12 bytes plus a GC handle to every object or a data orientated approach for fast batch processing where you'd stream through arrays of data.

ref returns for example fit more with a data orientated approach. And you could ref return an ValueAtomic from the array

boxing conversion penalty

Shouldn't be any boxing? There's an extra pass though concrete call to the struct which should be inlinable.

clrjunkie commented 7 years ago

@benaadams

var balance = objectBalance.Value; balance.Subtract(10, 0);

but 'balance' is the value how can it have 'Subtract' ?

benaadams commented 7 years ago

but 'balance' is the value how can it have 'Subtract' ?

If Box<T> was used it would be the Atomic

clrjunkie commented 7 years ago

@benaadams

If Box was used it would be the Atomic

You mean in case the user did var balance = objectBalance, but there is no reason to do so other then by mistake and there can be many other mistakes...

Looks like another reason why having a struct version is bad.

Shouldn't be any boxing?

I meant the "box" problem in general but as soon as you starting invoking methods that call the atomic through the base interface methods, notability IEquatable the unboxing penalty will start showing.

Why do we need this?

clrjunkie commented 7 years ago

@benaadams

If Box<T> was used it would be the Atomic

How can the Atomic be returned and not T ?

public class Atomic<T> : IAtomic<T> where T : IEquatable<T>
private ValueAtomic<T> _atom;

Atomic(T value)
{
  _atom = new ValueAtomic<T>(value);
} 

public T Read() => _atom.Read();

public struct ValueAtomic<T> : IAtomic<T> where T : IEquatable<T>
private T _data;

public ValueAtomic(T value)
{
    _data = value;
}
public T Read();

clrjunkie commented 7 years ago

@benaadams

I think you consider the large collection (was array) scenario to be very important, no problem.. what's the use-case? Maybe it involves comparing or lookup and the unboxing effect will introduce a negative effect overall?

clrjunkie commented 7 years ago

@benaadams

people take different approaches to things, and its to leave the flexibility open.

Are you advocating that each class in the framework should also have a struct version to satisfy every potential need or programing approach?

benaadams commented 7 years ago

If Box was used it would be the Atomic

How can the Atomic be returned and not T ?

Was in answer to @GSPP suggestion about using Box<T> instead.

The value-type Atomic should be "non-copyable". So casting it to object, to IAtomic, assigning to a local or assigning another heap variable to it should all be compile errors; as they will take a copy and not modify the original. Passing as a ref return or modifying in place should be fine.

Kind of like Span<T> is stack only; so you can't cast it to object or take a heap reference. Equally SpinWait should probably be stack only like Span<T> as more than one thing shouldn't spin on it, and SpinLock should have the same constraints the value-type Atomic as as soon as a copy is taken of it is no longer the same lock.

The IAtomic interface is for the generic constraint to apply so one set of extension methods can apply to both types and so the atomic type can be used with test IoC, DI if someone wants to create a different type that behaves functionally the same etc.

I think you consider the large collection (was array) scenario to be very important, no problem.

Going back to the Java comparison it has AtomicReferenceArray for similar.

However, with the struct type you could use a normal .NET array of ValueAtomic<object>[100] for similar effect; or project though a ReadOnlySpan<ValueAtomic<object>> for extra safety.

what's the use-case?

If you were organising data in a column store manner and say the data type was Decimal (16 bytes) then if you wanted to do a Sum:

ValueType: for loop over array, proceed in 32 byte skips, 2 per cache line (16 byte decimal + 8 byte null lock pointer + alignment) ObjectType: for loop over array, proceed in 8 byte skips (pointer to Atomic), 8 per cache line, dereference (potentially scattered data, otherwise) proceed in 48 byte skips 1.5 per cache line (12 byte object header + 16 byte decimal + 8 byte null lock pointer + alignment)

So the value type would be faster.

And if its any array of references then using a object type atomic would mean to use the object would require a double deference. Array->Atomic->Object->Function

benaadams commented 7 years ago

Are you advocating that each class in the framework should also have a struct version to satisfy every potential need or programing approach?

No; just certain building blocks, that represent extensions over simple types. If you are dealing with small types like int, decimal, reference, when adding extra features to it you want to be as light as possible else the object overhead dominates; though there are reasons for both.

Task<T>->ValueTask<T> Tuple<T1,T2,..>->ValueTuple<T1,T2,..> (not sure about Tuple) lock(object)->SpinLock T[]->Span<T>/ReadOnlySpan<T> Atomic<T>->ValueAtomic<T> etc

However, I do think it would be excellent if escape analysis was done on function allocated local objects and they were allocated on the stack if they didn't escape rather than the heap; but that's another story...

clrjunkie commented 7 years ago

@benaadams

If you were organising data in a column store manner and say the data type was Decimal (16 bytes) then if you wanted to do a Sum:

But your doing 'Sum'... you would probably need to lock some range to get a consistent point in time snapshot, how does atomicity come into play here per data item??

benaadams commented 7 years ago

how does atomicity come into play here per data item

Maybe poor example, call it EstimatedSum :)

Preventing torn reads; admittedly you could get the same effect by using Interlocked.CompareExchange and having every entry an object; or if you were using object pooling a 128bit Interlocked.CompareExchange which is kind of where this started as a more general form.

benaadams commented 7 years ago

@clrjunkie my main motivation is for an Interlocked.CompareExchange128 with a fallback on platforms that don't support it. The struct works as that, the class adds an extra allocation and indirection on the call for the supported route.

clrjunkie commented 7 years ago

@benaadams

Going back to the Java comparison it has AtomicReferenceArray for similar.

You can’t infer from that a general requirement for optimized traversal of large arrays for the purpose of invoking an atomic operation on each element. Plus, the fact that support for atomicity is provided by the collection and not the element should encourage you to question why they decided to implement the collection in the first place and not stay with a simple array of AtomicReference

my main motivation is for an Interlocked.CompareExchange128 with a fallback on platforms that don't support it.

For what practical purpose(s)?

Is this supported by any other major framework? (being on par is a valid motivation)

Did Windows provide a public api for this (on supported platforms)

You wrote at the beginning that the availability of the api would allow for more interesting lock-less algorithms, great! Then I genuinely believe that it’s absolute must to include sample usage code that shows a scenario, at least in pseudo code, even from an academic paper.

How exactly are Atomic<T> and bitness related?

BTW it reads like the cpu requirement for Windows 8.1 is for the enterprise edition. Windows 8.1 Enterprise: System Requirements

The struct works as that, the class adds an extra allocation and indirection on the call for the supported route.

I’m challenging the struct api, not the implementation, because:

No obvious requirement stands out for traversing large lists of atomic’s specifically.
I bet most developers would use at most one atomic instance per algorithm, experts 2 geniuses 3.
Structs can be tricky, that is on top of synchronization which in itself is tricky.

clrjunkie commented 7 years ago

@benaadams Make no mistake, I would love to see a use-case that takes advantage of the struct version and say, Aha!

benaadams commented 7 years ago

Is this supported by any other major framework? (being on par is a valid motivation)

(pre-C++ 11) Boost Atomic http://www.boost.org/doc/libs/master/doc/html/atomic/interface.html CLang/GCC C++ 11 std::atomic https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

Java AtomicStampedReference and AtomicMarkableReference

GoLang bytecode decoder (Don't know much about Go so not sure how its used)

Did Windows provide a public api for this (on supported platforms)

GCC/Clang on Linux support it via std::atomic

Windows via InterlockedCompareExchange128 https://msdn.microsoft.com/en-us/library/windows/desktop/hh972640%28v=vs.85%29.aspx (128bit std::atomic in MSVC isn't lock free)

Min client: Windows 8; Min server: Windows Server 2012

How exactly are Atomic<T> and bitness related?

Is the dwCAS or lock-free 2x64bit swap; to be able to CAS for a 64bit pointer and marker (ABA avoidance without GC) or 2x64bit pointer swap for double linked list.

BTW it reads like the cpu requirement for Windows 8.1 is for the enterprise edition.

Is for regular 8.1 x64 also https://support.microsoft.com/en-us/help/12660/windows-8-system-requirements (and Windows 10)

To install a 64-bit OS on a 64-bit PC, your processor needs to support CMPXCHG16b, PrefetchW, and LAHF/SAHF

Its been available on all AMD64 CPUs other than pre-2006 AMD ones, so all Intel have it. Which means MacOS/OSX also always have support. Requirement for Win 8.1 x64, Win 10 x64, Server 2012 R2 x64, Server 2016 x64.

Linux x64, Win7 x64, Win8 x64 may be running on first generation Opterons so would need fallback for x86 world (also for 32bit). Would also need fall backs for ARM (pre-ARMv[78]?).

No obvious requirement stands out for traversing large lists of atomic’s specifically.

Maybe something like Non-blocking Trees? http://www.cs.toronto.edu/~tabrown/chromatic/paper.pdf

I bet most developers would use at most one atomic instance per algorithm

Struct directly embedded in class rather than extra indirection?

There isn't a functionality gap that the struct would provide that the class couldn't. (Except it would probably be easier to align a specific struct on a 16byte boundary ValueAtomic<T> than a general struct T for Atomic<T>. If that aligned struct exists why not also expose it for use directly?)

However, equally there isn't any functionality you can do with a 128bit CAS that you can't do with a lock. Its a performance reason where the direct struct is better than the class.

Also my main use case is around object pooling and memory management, rather than other lock-free data structures, so allocating another object to do it bothers me in principle :wink:

Structs can be tricky

Hence the class variant also.

You wrote at the beginning that the availability of the api would allow for more interesting lock-less algorithms, great! Then I genuinely believe that it’s absolute must to include sample usage code that shows a scenario, at least in pseudo code, even from an academic paper.

Will aim to provide :smile:

clrjunkie commented 7 years ago

@benaadams

(pre-C++ 11) Boost Atomic http://www.boost.org/doc/libs/master/doc/html/atomic/interface.html

Bingo.

Windows via InterlockedCompareExchange128 https://msdn.microsoft.com/en-us/library/windows/desktop/hh972640%28v=vs.85%29.aspx (128bit std::atomic in MSVC isn't lock free)

Bingo^2.

How exactly are Atomic<T> and bitness related?

Is the dwCAS or lock-free 2x64bit swap; to be able to CAS for a 64bit pointer and marker (ABA avoidance without GC) or 2x64bit pointer swap for double linked list.

Don't understand. Can you simplify?

(edit: Don't understand how you got from 128bit to T)

I bet most developers would use at most one atomic instance per algorithm

Struct directly embedded in class rather than extra indirection?

I was referring to atomic variables in general, in which case AFAIK accessing either a class or a struct declared at the field level would involve one indirection for both.

Also my main use case is around object pooling and memory management,

Oh, I definitely got that.. :smiley:

benaadams commented 7 years ago

Is the dwCAS or lock-free 2x64bit swap; to be able to CAS for a 64bit pointer and marker (ABA avoidance without GC)

Don't understand. Can you simplify?

Was writing a response with some examples; discovered the api needs some tweaking... Might need to a full example set to get the api right :open_mouth:

benaadams commented 7 years ago

Started an initial project that I'll explore this further in https://github.com/benaadams/System.Threading.Atomics

clrjunkie commented 7 years ago

At this stage, I suggest you keep it driven towards one or two common scenarios. Skip the error handling.

whoisj commented 7 years ago

What if this is not already available via the Volatile class? I'm likely just missing something, so i wanted to ask.

clrjunkie commented 7 years ago

@whoisj As I see it this API aims to be an OO abstraction for both volatile and interlocked operations - cross-platform. Apparently the API’s fallback capability (e.g use ‘lock’ when O/S doesn't support CPU instruction CMPXCHG16b) is too easily overlooked. @benaadams might worth highlighting this at the very beginning, emphasizing that the 'lock fallback' is per instance.

benaadams commented 7 years ago

@whoisj Volatile and Interlocked are static classes representing actions on memory locations with 1:1 mapping of hardware capabilities so limited to the lowest common denominator (e.g. currently no 128 bit CAS)

The Atomic type is an abstraction on top; which allows fallbacks and also allows more complex behaviour to be captured in the types.

If for example you wanted to atomically count up, but to a maximum level, and atomically count down but only to a minimum. Then you'd have to do something like this (taken from ThreadPool.cs)

int outstandingRequests;

internal void EnsureThreadRequested()
{
    // Add one to a maximum of processorCount
    int count = Volatile.Read(ref outstandingRequests);
    while (count < processorCount)
    {
        int prev = Interlocked.CompareExchange(ref outstandingRequests, count + 1, count);
        if (prev == count)
        {
            ThreadPool.RequestWorkerThread();
            break;
        }
        count = prev;
    }
}

internal void MarkThreadRequestSatisfied()
{
    // Subtract one to a minimum of 0
    int count = Volatile.Read(ref outstandingRequests);
    while (count > 0)
    {
        int prev = Interlocked.CompareExchange(ref outstandingRequests, count - 1, count);
        if (prev == count)
        {
            break;
        }
        count = prev;
    }
}

Whereas with Atomic you could do

ValueAtomic<int> outstandingRequests;

internal void EnsureThreadRequested()
{
    if (outstandingRequests.CappedIncrement(processorCount).Success)
    {
        ThreadPool.RequestWorkerThread();
    }
}

internal void MarkThreadRequestSatisfied()
{
    outstandingRequests.CappedDecrement(0);
}

clrjunkie commented 7 years ago

Both CappedIncrement and CappedDecrement should be listed under "Proposed Api" along with a clear explanation for why they should exist, specifically what does it mean to:

atomically count up, but to a maximum level

whoisj commented 7 years ago

API’s fallback capability (e.g use ‘lock’ when O/S doesn't support CPU instruction CMPXCHG16b) is too easily overlooked.

@clrjunkie Thanks, that's exactly what I was getting at. Volatile already does the Monitor fallback to gloss over hardware differences. I think the approach is novel, but I often feel the days of OO design are behind us and the work is moving more towards a functional future.

kouvel commented 7 years ago

@benaadams:

... and can fallback to locks when the data width is not supported on the platform

My understanding is that interlocked operations need to be atomic with respect to all other interlocked operations. So if the 128-bit interlocked compare-exchange operation is not supported by the processor and a lock is used instead, its implementation would not be atomic with respect to other 64-bit interlocked compare-exchange operations on the same data for instance.

If that's correct, then we cannot fall back to using locks, and have to instead provide an API that determines whether it's available, and have the 128-bit interlocked compare-exchange throw if attempted.

I believe the 64-bit interlocked compare-exchange is available on all currently supported architectures, so it doesn't need such an API.

benaadams commented 7 years ago

If that's correct, then we cannot fall back to using locks, and have to instead provide an API that determines whether it's available, and have the 128-bit interlocked compare-exchange throw if attempted.

As the Atomic controls the reads and the writes they can both be locked; whether a lock, spinlock, reader/writer lock, left/right lock etc

I believe the 64-bit interlocked compare-exchange is available on all currently supported architectures, so it doesn't need such an API.

It doesn't work for structs unless you cast through IntPtr which is a bit ugly; and that can only work up to 8 byte structs. If you used a 16 byte struct that can only by compare-exchanged on some platforms and being a struct there is no handle to use to provide a fall back; like-wise if you went bigger with 32 byte or 64 bytes - so it would need a generic constraint that was related to its sizeof which I'd imagine wouldn't be forthcoming...

kouvel commented 7 years ago

As the Atomic controls the reads and the writes they can both be locked; whether a lock, spinlock, reader/writer lock, left/right lock etc

But does that mean all interlocked operations need to use a lock just to support 128-bit interlocked compare-exchange, when it's not available in the processor?

It doesn't work for structs unless you cast through IntPtr which is a bit ugly; and that can only work up to 8 byte structs

I see

whoisj commented 7 years ago

The Windows operating system exposes these constructs, which means it's very possible to implement them on Intel and ARM.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686360(v=vs.85).aspx

Looks like they support quite a number of "interlocked" operations, up to 128 bits.

Given that the compiler often doesn't know the size of structs at "compile" time, I'm not wholly sure how the library is supposed to know if it can safely do atomic operations on structs with intrinsics or not.

benaadams commented 7 years ago

But does that mean all interlocked operations need to use a lock just to support 128-bit interlocked compare-exchange, when it's not available in the processor?

They would need to use some kind of lock and both ValueAtomic<T> and Atomic<T> would have the IsLockFree boolean to allow the user to choose a different strategy if they felt the framework's choice of fallback locking mechanism wasn't suitable.

But for the majority of users they would be happy it was Atomic under all circumstances and hopefully the framework's choice of fallback lock would be suitable for the general case.

benaadams commented 7 years ago

Given that the compiler often doesn't know the size of structs at "compile" time, I'm not wholly sure how the library is supposed to know if it can safely do atomic operations on structs with intrinsics or not.

It could be done at jit time in the same way Vectors are done. For AOT it would know the sizes; but it would need to determine the available instructions at runtime. Both would need to work with up to 16byte alignment - so need to work with the allocator.

benaadams commented 7 years ago

Independent of the framework you could probably over allocate then use the Unsafe library to cast to aligned structs; however I'm not sure how it could safely be done with structs with references...

clrjunkie commented 7 years ago

@kouvel

My understanding is that interlocked operations need to be atomic with respect to all other interlocked operations. So if the 128-bit interlocked compare-exchange operation is not supported by the processor and a lock is used instead, its implementation would not be atomic with respect to other 64-bit interlocked compare-exchange operations on the same data for instance.

I think the first question is: what does it mean to do a 128 bit interlocked compare-exchange operation on a T?

clrjunkie commented 7 years ago

@whoisj

Given that the compiler often doesn't know the size of structs at "compile" time, I'm not wholly sure how the library is supposed to know if it can safely do atomic operations on structs with intrinsics or not.

Exactly.

whoisj commented 7 years ago

Independent of the framework you could probably over allocate then use the Unsafe library to cast to aligned structs; however I'm not sure how it could safely be done with structs with references...

It cannot. So we have a problem. Anything involving mutating a "reference structure" (ie any thing that is or contains a reference) cannot be treated as flat memory, therefore there are severe limiting factors here.

Honestly, CRITICAL_SECTIONS are stupid fast, and Monitor relies heavily on them. With the added advantage that Monitor is likely highly tuned per platform. Why not just rely on it?

In test after real-world test, the Monitor approach has proven to equivalent, and occasionally better, than the non-locking model.

I like Atomic<T>, but I think it needs to just rely on Volatile.Read/Write and Monitor.Enter/Exit to get things done.

dotnet / runtime