Open mppf opened 5 years ago
Another related idea is to extend on MyDataStructure
to run say proc DataStructure.on(...)
and then it would be relatively straightforward to support such patterns.
@dlongnecke-cray - it would be interested to include some ideas from this issue as we develop a menu of choices to make when designing a context manager.
Here is a way to provide this feature using a manage
statement, as proposed by @mppf, using the distributed histogram as an example. First off, if the user tries to take a ref of the histogram element, it will be disallowed because the data structure does not support it:
forall idx in inputs {
Counts[idx] += 1; // error: "use 'manage' to update elements
}
The user needs to use this style:
forall idx in inputs {
manage Counts(idx) as element {
element += 1;
}
}
The compiler queries Counts for the availability of a designated API method. If successful, it converts the forall loop to aggregation style:
forall idx in inputs with
(var agg = Counts.createAggregator(update=lambda(idx,element){element += 1;}))
{
agg.add(idx);
}
The aggregator will batch up and send out the updates by the time its task completes, at the latest. The destination locale will perform the updates in batch by applying the updating lambda to each element. If a given idx
is not yet in the index set of Counts
, it will add that index and default-initialize the corresponding element before applying the lambda to it.
One proposal is to make the pattern manage Counts(idx) as element { ... do something with element ... }
be the standard way to operate on an element of a parallel, possibly distributed, data structure. If it occurs at a location other than the last statement of a forall or task-parallel construct, it should add/update the element atomically without aggregation.
A variation on this proposal is to allow a data structure to delay the propagation of updates to remote locales -- and perhaps even to other tasks on the same locale -- until a memory fence is executed in accordance to Chapel's memory consistency model. This would be part of the semantics of the data structure. To implement this semantics, a manage Counts(idx)
statement would cause the compiler to obtain a handle from Counts
so that a designated API method is invoked on this handle when the handle's task performs a memory fence. This handle can be created at the start of the task or at the declaration of Counts
, whichever comes later. In this case the manage
statement can perform aggregation as well, except this time instead of the updating lambda it will store the new value for each "managed" index.
[Correction to the previous paragraph 3/24] The complexity of obtaining a handle may not be needed. A notification of a memory fence could go directly to Counts
via an API method.
An alternative way to perform bulk updates is to use a designated method passing it a lambda:
Counts.bulkUpdate(inputs, lambda(idx,element){element += 1;});
The advantages of these approaches are:
A couple of other points:
A variation on this proposal is to allow a data structure to delay the propagation of updates to remote locales
I call this "data structure consistency". Another element of it is that I expect that if you, say, add x
to a distributed set, and then query from that same task if x
is in the set, it would say "yes". But the set element might belong on a remote locale and it would just check the queued-up operations to know that x
was added. In this way it is similar to --cache-remote
.
In any case, I think that the understandability of the programs is still improved by having it be clear when the task of updating an element is completed. In particular, if we imagine a "data structure consistency" approach that allows references, one could write
ref x = MyDistributedMap[i];
fence();
x += 1;
which I think is a challenging case to handle correctly in the implementation - meaning I think getting all of this right would be easier if that pattern is simply not possible.
If we go with the sync
syntax, how does it know that (i,j)
is the index that the lock is being held for in the body of sync
? For example:
sync on Pairs(i,j) {
Pairs += (i,j);
Pairs.add(i,j);
Pairs(i,j) = k; // if this were meaningful
Pairs.remove(i,j);
Pairs.doWhoKnowsWhat(a,i,j,b);
}
My proposal:
The object returned from enterThis
(see the manage-based implementation idea above https://github.com/chapel-lang/chapel/issues/12306#issuecomment-1078461074) is a "callback" into the data structure (Pairs
in the above example), or we can call it "marker", "representative", "stand-in", ... If the key is not in the data structure when enterThis
returns, it is NOT added implicitly and the callback object is still produced.
Compiler matches i,j
from sync
header against the argument list of each function call in sync body. If there is a match, the compiler attempts to resolve a modified version of the call, where the actuals i,j
are replaced with a named actual callback=(the callback object)
. If it resolves, then that call replaces the original call. It is up to the data structure to perform the operation using the callback object. The compiler guarantees that this call is performed within the (implicit) manage statement. This approach does not handle Pairs += (i,j)
, I think it is OK.
Instead of replacing i,j
, we could just add the callback=...
argument. This may add unnecessary cost of passing i,j
when they are of complicated types, or may simplify the data structure implementation. The compiler could try both variants.
Compiler ensures that i
and j
do not change throughout sync body. For example, they are const
and not const ref
to something that can change.
This issue asks if we should add some language support for critical sections and if such support could enable smooth expression of certain aggregation patterns.
What do other languages use for critical-section (or similar) syntax?
Python
Python context manager (see e.g. https://en.wikibooks.org/wiki/Python_Programming/Context_Managers ).
What might this look like in Chapel? (Supposing there is a lock library - see #9881).
C++ / RAII
Now, as noted in #10341, we could already write this using an RAII data type:
However this pattern seems a bit awkward in this case because the name of the variable
locked
is irrelevant. (As is the fact that it's a variable at all).Java
In Java, critical sections are easy to write with the
synchronized
block. Here's what that might look like in Chapel:Further motivation
Why am I interested in this question?
Let's consider 3 cases:
Suppose we had syntax like
sync on <object> { body }
that would:<object>
The implementation of
sync on
would be similar to the Pythonwith
context manager in that it calls methods on the noted object in order to create the synchronization. For now, we'll supposed that it callssyncOnEnter
andsyncOnExit
methods.Critical section with lock object
We could consider allowing this to be written
sync myLock { }
in the event that the programmer did not want to move execution whenmylock
is remote (which would be an unusual idiom).The Lock type would simply support
syncOnEnter
that takes the lock andsyncOnExit
that releases the lock. The compiler would arrange to call these in the block entry and exit.For example, the compiler could translate it into:
Populating a distributed set in parallel
What about the case of populating a distributed set in parallel? This is the case that https://chapel-lang.org/CHIUW/2016/Ferguson-slides.pdf studied, but the problem with the work described there is that it wasn't general enough - it involved writing handlers for the specific case of distributed set construction. Can we do better with some higher level language ideas?
Here is a sketch of what the code would look like:
Here the idea is that the
sync on
would:Additionally sync the
sync on
is the last statement in theforall
, it can be done in an unordered manner, following #12150, so the (i,j) elements to be added can be collected and distributed in bulk.What would the implementation look like?
First, note that the Pairs HashedDomain would include syncOnEnter() and syncOnExit() methods that accept a key (so it can do locking per hashtable row if desired). The compiler would arrange to pass (i,j) as the argument to these in this particular case. It could thus translate it into
But the real fun begins if the last-statement-forall-unordered optimization fires (#12150 / #12269). In that event, the compiler and module code will aggregate by translating it into:
If this strategy is successful then
sync on
becomes common usage for updating a domain or array, and I think it would make sense to completely remove thelocking
argument from DefaultAssociative and HashedDom/Arr. That solves the problem of needing to know not to take the lock inside of Pairs. (However we might nonetheless need a transitional strategy).Updating a distributed histogram
Here the idea is that the
sync on
would:As above, the the
sync on
is the last statement in theforall
, it can be done in an unordered manner, so the histogram updates can be aggregated and sent out in bulk.This case is similar to the above (with slight differences in what arguments are bundled into the syncOnFn). The syncOnFn would consist of:
Other potential examples