Provide a user-facing mechanism to force processor atomics

ronawho commented 6 years ago

We support processor atomics everywhere and network atomics for some configurations (really just ugni on Cray's today.) For atomics that are only used on a single node, processor atomics offer better performance, especially under low-contention. Under high contention, blocking ugni atomics are ~2-3x slower than processor atomics, and up an order of magnitude or more slower under very low contention levels.

To force processor atomics, we have an ugly internal routine called chpl__processorAtomicType(), but we should provide some sort of user-facing routine:

Ideas that we've tossed around before:

var a: local atomic int;
var a: atomic(int, AtomicType.Processor);
var a: processorAtomic(int);

LouisJenkinsCS commented 6 years ago

I'd welcome this addition.

LouisJenkinsCS commented 6 years ago

Quick question is there a way to do this right now? I.E, is it possible to use chpl__processorAtomicType(int) as a type, right now? I kinda can't wait to use it now rather than wait for library

Edit: Nevermind, just re-read the description of this issue

ronawho commented 6 years ago

Yes, chpl__processorAtomicType is the current way to do this -- maybe interesting: https://github.com/chapel-lang/chapel/pull/10552

LouisJenkinsCS commented 6 years ago

Also in terms of how I believe syntax should work for how to create processor atomics over network atomics... I believe that a pragma can be a better (and easier on the eyes) alternative for the time being.

pragma "use processor atomic"
var a : atomic int;

Compared to...

var a : chpl__processorAtomicType(int);

But that's just my opinion. Of the three you chose, I'd go with

var a: atomic(int, AtomicType.Processor);

Because then you could hand-wave away that AtomicType.Network is a default argument or something.

bradcray commented 6 years ago

Thanks for opening this issue! I was thinking about it the other week when thinking about Louis's performance comparisons for his aggregation work (and meant to post a question asking whether he was using network atomics on that thread, but timed out before heading on vacation).

Which of the following do we imagine the semantics of this feature to be:

use processor atomics, but permit the atomic to be accessed by local or remote tasks (where remote ones would presumably have to use an active message for systems in which processor and network atomics weren't coherent)
use processor atomics, and only permit the variable to be accessed by tasks that happen to be running locally to that locale
something else?

ronawho commented 6 years ago

Hmm, good question.

I had been imagining it just being a hint that the atomic will primarily be accessed locally, but still permit remote access to it (so processor atomics wrapped with on-stmts as they are today)

That said if an atomic is guaranteed to only be accessed locally we could further reduce overhead by getting rid of the on-stmt too. https://github.com/chapel-lang/chapel/pull/3890 optimized direct/local on stmts so that there's just an extra locality check and a branch, but there is a non-trivial performance difference for the atomic perf test for --local vs --no-local (integer fetchAdd is 1.65s for --local and 1.95s for --no-local).

LouisJenkinsCS commented 6 years ago

Could there be two separate ways of requesting remote processor atomics? One that makes it entirely local and one that just uses processor atomics with a locality check? Perhaps like my example below.

// All accesses are local; this should annihilate any and all on-statements and locality checking
local var a : atomic int;
// This should still perform locality checks and do an on-statement if accessing remotely
pragma "use processor atomics"
var aa : atomic int;

Why? You might have an atomic that is updated locally more often than it is checked globally.

bradcray commented 6 years ago

My thinking was similar to Elliot's and Louis's: I imagined that the feature we were discussing in this issue would still be accessible remotely by default; but that also having a way to clamp down and say "only permit me to access this locally" could also be useful as a separate feature request (it matches a past request from users for such a feature in other, non-atomic, contexts to avoid accidental communications, much like const protects against accidental assignments).

Based on our agreement on this point, I think we should avoid using the local keyword for this issue's request, as it implies to me more of a "can only be accessed locally" interpretation (let me know if you disagree). I think it is a viable and attractive feature if/when we want to define it (and would apply to general types, not just atomics). It'd be great for one of us to capture this as a separate issue (I can do it as I get more caught up if nobody else does).

I think my intuition goes to an approach that parameterizes the atomic keyword itself, though they end up being a little verbose:

var a: atomic(atomicKind.processor) int;   // could just be atomic(processor) via `use atomicKind`

or:

var a: atomic(processor=true) int;   // could just be atomic(true) if anyone could keep that straight

Of these two, the enum approach is probably the more principled and self-documenting way to go, and permits us to provide additional implementations in the future. I just wish it weren't so verbose (which might just suggest coming up with shorter names for the enum and its symbols).

I'm not crazy about using a pragma for my usual reasons (pragmas aren't intended to be user-facing, which is what this issue is asking for, and typically suggest to me that the language wasn't designed right). It also has the downside that it doesn't do a good job of supporting the use of param-time machinery to compute whether to use processor or network atomics, which I think is a must for this feature (and would ideally result in something other than a if (test) then atomicType1 else atomicType2 pattern, which bumps Elliot's third suggestion down in my rankings).

This also falls into the category of a feature that affects performance but not program correctness or behavior (right?) which at times has led to discussions of introducing some sort of user-facing annotation feature to the language. There, as with the pragmas, I worry about the ability to param control the property (though I suppose it all depends on how we design the annotation feature). The main downside to this approach is that we'd have to design the general feature before using it here (and, ideally, populate it with a few more cases, or at least have some ready to go...). But I'm not opposed to heading down that path if we think it's the right ultimate solution...

Given that I'm a little gun-shy about the general annotation feature (at least at this point in time), I currently lean towards something like Elliot's second suggestion (where I assume we'd specify that atomic int is sugar that rewrites to atomic(int, ...something...)) or the styles I make above.

chapel-lang / chapel

Provide a user-facing mechanism to force processor atomics #10551