Privatization Overhaul - Githubissues

LouisJenkinsCS commented 6 years ago

Currently privatization has a few drawbacks with its current descriptor-table approach. For one, as documented, privatized objects are created on all locales, even on ones they are not originally distributed over; second they all induce extra communication to Locale 0 via a fetchAdd, as it is the sole manager of privatization id counter. To name some of the undocumented drawbacks, it turns out that the privatization ids are not yet recycled, and as such if enough privatized objects are spawned (even if they were properly deleted and cleared from the table) they will constantly grow the table; you could have the entire table be empty, request privatization for a single object and still have it trigger a resize! As well, the issue with parallel-safe resizing, being currently explored in #8182 with usage of quiescent states, is still an ongoing problem in Computer Science, has a unique constraint of requiring nearly zero-overhead for concurrent access (Stencil-PPK makes requests to chpl_getPrivatizedClass a staggering 1.2 billion times).

I propose a change in privatization, or at least in certain places where it is used in performance-critical code paths. Currently Chapel seems to have a setup like thus:

class Obj {
   proc Obj(...) { 
      // Initialize object...
      pid = _newPrivatizedClass(this);
   }

   proc Obj(other, ...) {
        // Initialize privatized object as unique or based on 'other' instance
       pid = other.pid;
    }
}

Where _newPrivatizedClass seems to do the following...

// Allocated on Locale #0
const newPid = numPrivatizedObjects.fetchAdd(1);
coforall loc in Locales do on loc {
    var privatizedObj = new Obj(original, ...);
     // chpl_privatizedObjects[newPid] = privatizedObj
    chpl_newPrivatizedClass(newPid, privatizedObj);
}

Then whenever a privatized copy is called it just makes calls to chpl_getPrivatizedClass which is essentially an array access to chpl_privatzedObjects (names may not be 100% accurate). Now, my proposal is that, at least for ChapelArray, is the following:

For each privatized object, add a new hidden field (or force user to declare one) called privatizedObjects : [localeDomain] Obj; this field will be populated after all of the targeted locales finish with the privatized instances. I'd say ideally localeDomain is just the range of locale ids. Now if we want to obtain our instance, we can call privatizedObjects[here.id], or if we want to access another locale's instance we have it at our fingertips (meaning if we network atomics we can avoid an 'executeOn' statement or something if we wanted to; I mean since its the exact same object, it won't be hard to figure out how to access a field directly without requiring an extra PGAS round-trip by dereferencing it each time). By allowing the user to select the domain, we do not need to create one across all locales, only the ones that it is distributed over. As well no communication is need to go to Locale 0 to create a new instance, and there is no need to recycle descriptors. In fact, any issue with the additional memory overhead is minor and bounded compared to the ever-growing descriptor-table approach. It also would eliminate the need for memory reclamation as no table is needed.

@mppf

Edit:

A potential optimization can be made for the arrays as well. As long as they are mapped to the locale ids we can store each privatized object by its local address and we can construct the wide pointer using the locale id it is mapped to.

Another optimization would be allowing the localeDomain to also include a subLocaleDomain for NUMA architectures, if that makes sense. So it can be accessed like privatizedObjects[here.id, here.sid] or something cool like that! The sky is the limit!

bradcray commented 6 years ago

@LouisJenkinsCS : Thanks for shining a light on this. The lack of recycling / reuse in the privatization table has been a concern for me / us* since it was first implemented. We knew it was problematic, but needed to move forward with the intention of getting back to it later which has then never happened. In fact, if you were to ask me about ancient dirty laundry that's festering somewhere in the code base and needing attention without current expertise on the team, this is one of the first things that would occur to me. I don't recall enough about what we have on master today to have great insight about what should be done to approve it, but wanted to express my support for moving from where we are today to something smarter / saner / more scalable.

With respect to your specific proposal, I'd need to spend more time with it to have an opinion, but wanted to voice my appreciation of your looking into this issue in the meantime.

LouisJenkinsCS commented 6 years ago

Appreciate the kind words, Brad! I'm 100% certain Chapel is going to be the future go-to programming language for high performance computing, and by gosh am I going to help see it through!

mppf commented 6 years ago

Then whenever a privatized copy is called it just makes calls to chpl_getPrivatizedClass which is essentially an array access to chpl_privatzedObjects (names may not be 100% accurate). Now, my proposal is that, at least for ChapelArray, is the following:

For each privatized object, add a new hidden field (or force user to declare one) called privatizedObjects : [localeDomain] Obj; this field will be populated after all of the targeted locales finish with the privatized instances.

So, right now, there is 1 chpl_privatzedObjects array per locale. With your proposal, there's normally only 1 official array instance. That official array instance is stored on some locale.

Right now, if one locale has the "official" array instance, it can pass a reference to the privatized instance to another locale with no additional communication required - i.e. it just sends the integer privatization ID. In fact, all array references become just references to the privatized integer ID, so any time the array reference is moved around, it includes this integer ID. (By "array reference" I'm referring specifically to the _array record in ChapelArray.chpl).

So, it seems to me that the problem still needing solving is - how can a reference to the "official" array be converted to a reference to the local, privatized one? Ideally with no communication required (beyond whatever happened to set up the privatization?)

LouisJenkinsCS commented 6 years ago

So, right now, there is 1 chpl_privatzedObjects array per locale. With your proposal, there's normally only 1 official array instance. That official array instance is stored on some locale.

I'm not sure I understand what you mean by 'official' array? With my proposal each privatized instance will also have the very same privatizedObjects. If you were to use your privatized instance you would use your own privatizedObjects to find other privatized instances allocated on other nodes. If you are you using record forwarding you can store in the record the very same privatizedObjects.

For example...

record _array {
   privatizedObjects : [localeDom] LocalArray;
   forwarding privatizedObjects[here.id];
}

Really whether or not the fields are kept in just a record wrapper or in the instances themselves depends on usage.

LouisJenkinsCS commented 6 years ago

Also I would like to advocate for a potential change where all classes utilizing privatization can have record forwarding setup, or at least a template to help the user to do so. Record wrapping currently is necessary for getting decent performance as it eliminates any PGAS round-trip. Whenever I privatize my classes I have to do the same boilerplate over and over again.

LouisJenkinsCS commented 6 years ago

Also I'd like to make note that I'd be willing to create the prototype for this by making changes to _array and testing potential trade-offs before it is applied to privatization as a whole (with help of course)

mppf commented 6 years ago

So, in this example:

record _array {
   privatizedObjects : [localeDom] LocalArray;
   forwarding privatizedObjects[here.id];
}

What is privatizedObjects? Normally a declaration of that type would be record _array but we can't have a self-including record.

For simplicity, let's suppose it's a "low level C array", e.g.

record _array {
   privatizedObjects : _ddata(LocalArray);
   forwarding privatizedObjects[here.id];
}

OK. Now privatizedObjects is a pointer in memory to an array somewhere.

The question is, what happens if I have such an _array on Locale 0, and Locale 1 GETs the record value. How can Locale 1 find it's local array?

If the answer is "Just look in privatizedObjects[1]", that requires a GET:

Here is the sequence of events:

Locale 0 has an _array, say A
Locale 1 refers to that array variable, uses a GET to read the record fields. Let's call that tmpA. tmpA is on locale 1 but the fields all point to memory on Locale 0.
Now Locale 1 would like to access an array element.
To find the privatized version, Locale 1 looks in tmpA.privatizedObjects[1]. But tmpA.privatizedObjects is a wide pointer that points to memory allocated on Locale 0, so this operation requires a GET.

LouisJenkinsCS commented 6 years ago

I'd expect the entire record to be copied by value, including the array of privatized objects. If Locale 0 has the record, then wouldn't Locale 1 get its own copy of the record to use in future operations? If it performed a deep copy instead of a shallow copy wouldn't that resolve the issue?

bradcray commented 6 years ago

I'd expect the entire record to be copied by value

That seems like it would lead to coherence problems if the original array were modified. In the current world, changes to the privatized instance are rippled to other locales' privatized copies so that they're all still "seeing" the same conceptual array. If other locales had a copy of the original array's meta-data, how would those copies be updated when necessary?

rough example:

// declare a global domain and array that require / benefit from privatization
var D = {1..n} dmapped SomethingTriggeringPrivatization(...);
var A: [D] real;

coforall loc in Locales {
  for i in 1..steps {
    ...compute on A...
    barrier();
    if (loc == 0) then
      D = {1..2*D.size};  // double D's size.  A will also be reallocated.  All locales should see these 
                          // changes since it's to a global variable visible from within their lexical scope
    }
    barrier();
  }
}

LouisJenkinsCS commented 6 years ago

changes to the privatized instance are rippled to other locales' privatized copies so that they're all still "seeing" the same conceptual array

So if I'm reading this correctly, the change to resize the array does something like this...

coforall loc in Locales {
   D = {1..2*D.size};
}

Where each locale has its own privatized instance for D and A (meaning that both of them would be in the privatization table?), when A is reallocated does this mean that the privatized meta-data (such as the pid or privatizedObjects?) are replaced with new ones?

bradcray commented 6 years ago

So if I'm reading this correctly, the change to resize the array does something like this...

Conceptually, I suppose you could think of it that way, but I think the more accurate way to think about it (with the caveat that it's been a long time since I've dug into this in some detail) is "the locale whose privatized copy is assigned reflects those changes to all the other locales." So I'd sketch this as:

if (locale == 0) {
  D = {1..2*D.size);  // update my copy of D and reallocate A
  // "broadcast" my changes to the privatized class objects to all other locales; copies
}

The calls in question to implement these "re-broadcasts" are the dsiReprivatize(), IIRC.

when A is reallocated does this mean that the privatized meta-data (such as the pid or privatizedObjects?) are replaced with new ones?

I'm fairly certain that the A object doesn't move or change pids, that only its fields change.

LouisJenkinsCS commented 6 years ago

Hm... I see that first I must tackle the Domain map Standard Interface before I can suggest a better solution. I'll do some more research for a while on the issue of implementing privatization without relying on _array.

LouisJenkinsCS commented 6 years ago

Proposal: Recycle privatization ids

I'm thinking that there should be a non-blocking stack (Treiber Stack) that can keeps track of recycled privatization ids allocated on Locale#0 (same as where the counter is). Memory reclamation can be handled by QSBR. Whenever the privatized instances are 'deleted', we should push their privatization ids on the stack. Whenever a new privatization instance is requested, it should first pop from the stack and if it is empty it should perform a fetchAdd on the counter (all on Locale#0) and register the new object in the privatization array.

The rationale is that by recycling these keys, we can keep down the overhead of having to call chpl_clearPrivatizedClass and chpl_newPrivatizedClass and thereby solving the issue of the unbounded expansion of the privatization array even if it was empty, per say; if it were empty, then the stack would hold all of the ids to be recycled. Again, memory reclamation is handled by QSBR (once the PR gets accepted that is).

mppf commented 5 years ago

If we were to combine @LouisJenkinsCS's object-contains-array-of-privatized-objects idea above with the privatization cache idea from https://github.com/chapel-lang/chapel/issues/10160#issuecomment-491274892 :

the privatization cache would simply return the current locale's privatized copy of a data structure given the wide address of the "main" instance. It is just a map from pointer -> pointer. Since we view it as a cache, it can simply discard old data (and so issues of recycling / memory reclamation are reduced). Any data discarded in this manner that is needed again can be recomputed by reading the relevant remote privatizedObjects array element.
it avoids the coherence problems when the data is modified. It can still use the reprivatize strategy currently used, but improves upon it because only those locales with a privatized copy need to update.
it allows for lazy privatization to be implemented easily but does not require it. In lazy privatization, the privatizedObjects would start out storing nil. Then, when the privatization cache needs to populate the entry for that instance, instead of simply reading from privatizedObjects, it would run a privatize call to construct the local instance at that point and store it back in the privatizedObjects array. (And then cache the result).
the advantages @LouisJenkinsCS was looking for are still present:
- privatization could occur only on select locales
- no communication is need to go to Locale 0 to create a new instance (there are no global IDs)
- no need to recycle global IDs

chapel-lang / chapel

Privatization Overhaul #8483