Open LouisJenkinsCS opened 6 years ago
@LouisJenkinsCS : Thanks for shining a light on this. The lack of recycling / reuse in the privatization table has been a concern for me / us* since it was first implemented. We knew it was problematic, but needed to move forward with the intention of getting back to it later which has then never happened. In fact, if you were to ask me about ancient dirty laundry that's festering somewhere in the code base and needing attention without current expertise on the team, this is one of the first things that would occur to me. I don't recall enough about what we have on master today to have great insight about what should be done to approve it, but wanted to express my support for moving from where we are today to something smarter / saner / more scalable.
With respect to your specific proposal, I'd need to spend more time with it to have an opinion, but wanted to voice my appreciation of your looking into this issue in the meantime.
Appreciate the kind words, Brad! I'm 100% certain Chapel is going to be the future go-to programming language for high performance computing, and by gosh am I going to help see it through!
Then whenever a privatized copy is called it just makes calls to chpl_getPrivatizedClass which is essentially an array access to chpl_privatzedObjects (names may not be 100% accurate). Now, my proposal is that, at least for ChapelArray, is the following:
For each privatized object, add a new hidden field (or force user to declare one) called privatizedObjects : [localeDomain] Obj; this field will be populated after all of the targeted locales finish with the privatized instances.
So, right now, there is 1 chpl_privatzedObjects array per locale. With your proposal, there's normally only 1 official array instance. That official array instance is stored on some locale.
Right now, if one locale has the "official" array instance, it can pass a reference to the privatized instance to another locale with no additional communication required - i.e. it just sends the integer privatization ID. In fact, all array references become just references to the privatized integer ID, so any time the array reference is moved around, it includes this integer ID. (By "array reference" I'm referring specifically to the _array
record in ChapelArray.chpl).
So, it seems to me that the problem still needing solving is - how can a reference to the "official" array be converted to a reference to the local, privatized one? Ideally with no communication required (beyond whatever happened to set up the privatization?)
So, right now, there is 1 chpl_privatzedObjects array per locale. With your proposal, there's normally only 1 official array instance. That official array instance is stored on some locale.
I'm not sure I understand what you mean by 'official' array? With my proposal each privatized instance will also have the very same privatizedObjects
. If you were to use your privatized instance you would use your own privatizedObjects
to find other privatized instances allocated on other nodes. If you are you using record forwarding you can store in the record the very same privatizedObjects
.
For example...
record _array {
privatizedObjects : [localeDom] LocalArray;
forwarding privatizedObjects[here.id];
}
Really whether or not the fields are kept in just a record wrapper or in the instances themselves depends on usage.
Also I would like to advocate for a potential change where all classes utilizing privatization can have record forwarding setup, or at least a template to help the user to do so. Record wrapping currently is necessary for getting decent performance as it eliminates any PGAS round-trip. Whenever I privatize my classes I have to do the same boilerplate over and over again.
Also I'd like to make note that I'd be willing to create the prototype for this by making changes to _array
and testing potential trade-offs before it is applied to privatization as a whole (with help of course)
So, in this example:
record _array {
privatizedObjects : [localeDom] LocalArray;
forwarding privatizedObjects[here.id];
}
What is privatizedObjects? Normally a declaration of that type would be record _array
but we can't have a self-including record.
For simplicity, let's suppose it's a "low level C array", e.g.
record _array {
privatizedObjects : _ddata(LocalArray);
forwarding privatizedObjects[here.id];
}
OK. Now privatizedObjects
is a pointer in memory to an array somewhere.
The question is, what happens if I have such an _array
on Locale 0, and Locale 1 GET
s the record value. How can Locale 1 find it's local array?
If the answer is "Just look in privatizedObjects[1]", that requires a GET:
Here is the sequence of events:
_array
, say AGET
to read the record fields. Let's call that tmpA. tmpA is on locale 1 but the fields all point to memory on Locale 0.I'd expect the entire record to be copied by value, including the array of privatized objects. If Locale 0 has the record, then wouldn't Locale 1 get its own copy of the record to use in future operations? If it performed a deep copy instead of a shallow copy wouldn't that resolve the issue?
I'd expect the entire record to be copied by value
That seems like it would lead to coherence problems if the original array were modified. In the current world, changes to the privatized instance are rippled to other locales' privatized copies so that they're all still "seeing" the same conceptual array. If other locales had a copy of the original array's meta-data, how would those copies be updated when necessary?
rough example:
// declare a global domain and array that require / benefit from privatization
var D = {1..n} dmapped SomethingTriggeringPrivatization(...);
var A: [D] real;
coforall loc in Locales {
for i in 1..steps {
...compute on A...
barrier();
if (loc == 0) then
D = {1..2*D.size}; // double D's size. A will also be reallocated. All locales should see these
// changes since it's to a global variable visible from within their lexical scope
}
barrier();
}
}
changes to the privatized instance are rippled to other locales' privatized copies so that they're all still "seeing" the same conceptual array
So if I'm reading this correctly, the change to resize the array does something like this...
coforall loc in Locales {
D = {1..2*D.size};
}
Where each locale has its own privatized instance for D
and A
(meaning that both of them would be in the privatization table?), when A
is reallocated does this mean that the privatized meta-data (such as the pid
or privatizedObjects
?) are replaced with new ones?
So if I'm reading this correctly, the change to resize the array does something like this...
Conceptually, I suppose you could think of it that way, but I think the more accurate way to think about it (with the caveat that it's been a long time since I've dug into this in some detail) is "the locale whose privatized copy is assigned reflects those changes to all the other locales." So I'd sketch this as:
if (locale == 0) {
D = {1..2*D.size); // update my copy of D and reallocate A
// "broadcast" my changes to the privatized class objects to all other locales; copies
}
The calls in question to implement these "re-broadcasts" are the dsiReprivatize()
, IIRC.
when A is reallocated does this mean that the privatized meta-data (such as the pid or privatizedObjects?) are replaced with new ones?
I'm fairly certain that the A
object doesn't move or change pids, that only its fields change.
Hm... I see that first I must tackle the Domain map Standard Interface before I can suggest a better solution. I'll do some more research for a while on the issue of implementing privatization without relying on _array
.
Proposal: Recycle privatization ids
I'm thinking that there should be a non-blocking stack (Treiber Stack) that can keeps track of recycled privatization ids allocated on Locale#0 (same as where the counter is). Memory reclamation can be handled by QSBR. Whenever the privatized instances are 'deleted', we should push their privatization ids on the stack. Whenever a new privatization instance is requested, it should first pop from the stack and if it is empty it should perform a fetchAdd on the counter (all on Locale#0) and register the new object in the privatization array.
The rationale is that by recycling these keys, we can keep down the overhead of having to call chpl_clearPrivatizedClass
and chpl_newPrivatizedClass
and thereby solving the issue of the unbounded expansion of the privatization array even if it was empty, per say; if it were empty, then the stack would hold all of the ids to be recycled. Again, memory reclamation is handled by QSBR (once the PR gets accepted that is).
If we were to combine @LouisJenkinsCS's object-contains-array-of-privatized-objects idea above with the privatization cache idea from https://github.com/chapel-lang/chapel/issues/10160#issuecomment-491274892 :
privatizedObjects
array element.reprivatize
strategy currently used, but improves upon it because only those locales with a privatized copy need to update.privatizedObjects
would start out storing nil
. Then, when the privatization cache needs to populate the entry for that instance, instead of simply reading from privatizedObjects
, it would run a privatize
call to construct the local instance at that point and store it back in the privatizedObjects
array. (And then cache the result).
Currently privatization has a few drawbacks with its current descriptor-table approach. For one, as documented, privatized objects are created on all locales, even on ones they are not originally distributed over; second they all induce extra communication to Locale 0 via a fetchAdd, as it is the sole manager of privatization id counter. To name some of the undocumented drawbacks, it turns out that the privatization ids are not yet recycled, and as such if enough privatized objects are spawned (even if they were properly deleted and cleared from the table) they will constantly grow the table; you could have the entire table be empty, request privatization for a single object and still have it trigger a resize! As well, the issue with parallel-safe resizing, being currently explored in #8182 with usage of quiescent states, is still an ongoing problem in Computer Science, has a unique constraint of requiring nearly zero-overhead for concurrent access (Stencil-PPK makes requests to
chpl_getPrivatizedClass
a staggering 1.2 billion times).I propose a change in privatization, or at least in certain places where it is used in performance-critical code paths. Currently Chapel seems to have a setup like thus:
Where
_newPrivatizedClass
seems to do the following...Then whenever a privatized copy is called it just makes calls to
chpl_getPrivatizedClass
which is essentially an array access tochpl_privatzedObjects
(names may not be 100% accurate). Now, my proposal is that, at least for ChapelArray, is the following:For each privatized object, add a new hidden field (or force user to declare one) called
privatizedObjects : [localeDomain] Obj
; this field will be populated after all of the targeted locales finish with the privatized instances. I'd say ideallylocaleDomain
is just the range of locale ids. Now if we want to obtain our instance, we can callprivatizedObjects[here.id]
, or if we want to access another locale's instance we have it at our fingertips (meaning if we network atomics we can avoid an 'executeOn' statement or something if we wanted to; I mean since its the exact same object, it won't be hard to figure out how to access a field directly without requiring an extra PGAS round-trip by dereferencing it each time). By allowing the user to select the domain, we do not need to create one across all locales, only the ones that it is distributed over. As well no communication is need to go to Locale 0 to create a new instance, and there is no need to recycle descriptors. In fact, any issue with the additional memory overhead is minor and bounded compared to the ever-growing descriptor-table approach. It also would eliminate the need for memory reclamation as no table is needed.@mppf
Edit:
A potential optimization can be made for the arrays as well. As long as they are mapped to the locale ids we can store each privatized object by its local address and we can construct the wide pointer using the locale id it is mapped to.
Another optimization would be allowing the
localeDomain
to also include asubLocaleDomain
for NUMA architectures, if that makes sense. So it can be accessed likeprivatizedObjects[here.id, here.sid]
or something cool like that! The sky is the limit!