API Proposal: Add Interlocked ops w/ explicit memoryOrder

sdmaclea commented 6 years ago

While trying to stabilize the thread pool for linux-arm64 during the release 2.1 effort, it became apparent that the safest thing would be to assume that existing code assumed an interlocked operation guaranteed barrier to enforce sequential consistency at least with respect to operations before and after the interlocked operations.

While this approach is likely to guarantee functional correctness in the most legacy code, it does come at a significant cost to weakly ordered machines. Also it is actually rare that an Interlocked operation would actually need to guarantee sequential consistency.

This proposal adds a MemoryOrder parameter to each atomic interlocked operation.

The proposal currently does not show the MemoryOrder parameter with a default MemoryOrder memoryOrder = SequentiallyConsistent because those API already exist and can be presumed to continue to exist in order to support NetStandard2.1 and earlier.

namespace System.Threading
{
    public enum MemoryOrder
    {
        SequentiallyConsistent,
        AcquireRelease,
        Release,
        Acquire,
        Consume,
        Relaxed
    }

    public static partial class Interlocked
    {
        public static int Add(ref int location1, int value, MemoryOrder memoryOrder);
        public static long Add(ref long location1, long value, MemoryOrder memoryOrder);
        public static double CompareExchange(ref double location1, double value, double comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static int CompareExchange(ref int location1, int value, int comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static long CompareExchange(ref long location1, long value, long comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static IntPtr CompareExchange(ref IntPtr location1, IntPtr value, IntPtr comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static object CompareExchange(ref object location1, object value, object comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static float CompareExchange(ref float location1, float value, float comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static T CompareExchange<T>(ref T location1, T value, T comparand, MemoryOrder memoryOrderSuccess, MemoryOrder memoryOrderFail);
        public static int Decrement(ref int location, MemoryOrder memoryOrder);
        public static long Decrement(ref long location, MemoryOrder memoryOrder);
        public static double Exchange(ref double location1, double value, MemoryOrder memoryOrder);
        public static int Exchange(ref int location1, int value, MemoryOrder memoryOrder);
        public static long Exchange(ref long location1, long value, MemoryOrder memoryOrder);
        public static IntPtr Exchange(ref IntPtr location1, IntPtr value, MemoryOrder memoryOrder);
        public static object Exchange(ref object location1, object value, MemoryOrder memoryOrder);
        public static float Exchange(ref float location1, float value, MemoryOrder memoryOrder);
        public static T Exchange<T>(ref T location1, T value, MemoryOrder memoryOrder) where T : class;
        public static int Increment(ref int location, MemoryOrder memoryOrder);
        public static long Increment(ref long location, MemoryOrder memoryOrder);
        public static void MemoryBarrier(MemoryOrder memoryOrder);
    }

GrabYourPitchforks commented 6 years ago

I'll also ping the GC team separately to follow up on the question I raised re: how these APIs interact with the card table.

sdmaclea commented 6 years ago

@GrabYourPitchforks These should be orthogonal to the writebarriers required by the CG on heap allocations, heap reference writes....

GrabYourPitchforks commented 6 years ago

@sdmaclea For the most part I agree, but my concern was regarding the Exchange<T> and CompareExchange<T> APIs in particular. If the GC's going to force a particular memory ordering when updating the card table, then it could affect which memory orderings are valid for those two specific operations.

sdmaclea commented 6 years ago

That may turn out to be an implementation detail. Where the Pointer/Reference forms must use a more restrictive memory ordering, but the user can request a weaker ordering. ExchangePtr and CompareExcchangePtr are already handled differently, so your concern is very valid.

GrabYourPitchforks commented 6 years ago

@stephentoub @CarolEidt Regarding the case where these methods are called with a non-immediate value for MemoryOrder, would it be feasible to say that they should just generate SequentiallyConsistent instead of switching on the non-immediate value at runtime? For example:

public static int Increment(ref int location, MemoryOrder mo) => Increment(ref location);

public void MyMethod(ref int foo)
{
    // Passing an immediate; the JIT can generate appropriate assembly.
    Interlocked.Increment(ref foo, MemoryOrder.Relaxed);

    // Passing a non-immediate; this is treated as a normal method call instead of an intrinsic,
    // so in the end it just turns into a standard Increment(ref int) call.
    MemoryOrder mo = (MemoryOrder)(new Random().Next());
    Interlocked.Increment(ref foo, mo);
}

I wonder if this would simplify the logic a bit. For reference, there was discussion of having a code analyzer that would flag non-immediates passed to some of the hardware intrinsic APIs. (See https://github.com/dotnet/coreclr/issues/15795#issuecomment-356431086.) That may be useful here as well.

terrajobst commented 6 years ago

Video

Are we sure the enum members are sensibly named? The names do not make a lot of sense to us, but that might just domain knowledge. Also, @GrabYourPitchforks just noticed that Release is being deprecated. Before approving I'd like to get confirmation on the names.

sdmaclea commented 6 years ago

@terrajobst I see no indication that release is deprecated. The https://en.cppreference.com/w/cpp/atomic/memory_order could lead one to believe that where the enums defining the memory order are changed to enum + constexpr in C++20.

GrabYourPitchforks commented 6 years ago

@sdmaclea Got it, I misinterpreted the (until C++20) marker as applying specifically to memory_order_release instead of to the entire memory_order typedef. That's my mistake.

CarolEidt commented 6 years ago

would it be feasible to say that they should just generate SequentiallyConsistent instead of switching on the non-immediate value at runtime? ... I wonder if this would simplify the logic a bit.

I don't see how this would simplify the logic, The existing intrinsic support in the JIT makes heavy use of the "non-immediate value falls back to recursive case which JIT expands" approach. It works and is relatively straightforward.

Are we sure the enum members are sensibly named? The names do not make a lot of sense to us

They make sense to me - and I believe they are consistent with general usage, not just in C++, but in memory model discussions.

GrabYourPitchforks commented 6 years ago

I don't see how this would simplify the logic, The existing intrinsic support in the JIT makes heavy use of the "non-immediate value falls back to recursive case which JIT expands" approach. It works and is relatively straightforward.

What I meant is that the implementation could look like this:

[Intrinsic]
public static int Method(..., MemoryOrder order) => Method(..., MemoryOrder.SequentiallyConsistent);

You'd still have the recursive call, but this basically turns into "if the JIT can't determine the literal value of the MemoryOrder parameter, it says screw it and treats the call site as if it were sequentially consistent." That avoids us having to write a switch statement inside the method implementations.

jakobbotsch commented 4 years ago

I do not think MemoryOrder.Consume should necessarily be included, it does not seem to be well-defined in C++ and the standard recommends not using it:

memory_order::consume : a load operation performs a consume operation on the affected memory
location. [Note: Prefer memory_order::acquire , which provides stronger guarantees than memory_-
order::consume . Implementations have found it infeasible to provide performance better than that of
memory_order::acquire. Specification revisions are under consideration. —end note]

The C++ standard committee have also had problems defining memory_order::relaxed. Hans Boehm talks about both memory_order::relaxed and memory_order::consume here (he calls memory_order::consume a failed experiment): https://www.youtube.com/watch?v=M15UKpNlpeM&feature=youtu.be&t=1283

dotnet / runtime

API Proposal: Add Interlocked ops w/ explicit memoryOrder #26092