chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.76k stars 414 forks source link

Enable Chapel to provide ideal semantics for non-standard distributions #22810

Open amitha-c opened 1 year ago

amitha-c commented 1 year ago

This is the continuation to the discussion #17999.

Background:

In our group, we are working on developing an array distribution for the data residing on Fabric attached Memory, a disaggregated memory that is decoupled from the compute nodes. Refer to OpenFAM for more details.

One of our goals in this exercise is to provide similar user experience as the existing Chapel array distributions. Because, Chapel’s array interfaces are originally designed for the memory that is tied to the compute nodes, it is challenging to provide the consistent semantics for all the arrays. One such operation that is a challenge to provide ideal semantics when implementing for disaggregated memory like FAM, is the random indexed element access implemented through dsiAccess() interface.

Chapel expects dsiAccess() method to return a reference of the array element and hence not possible for us to return the reference or handle to the disaggregated memory. We worked around this by introducing a wrapper class object around dsiAccess(). The wrapper class instance is created for every dsiAccess() call and stores the required information for FAM access and returns this wrapper class object from the method. That means every access to random indexed element is translated into wrapper class object access internally. The actual FAM data read and write operations are then achieved through wrapper class methods.

For FAM read , we implemented method called "this()" and the “=” operator overload method to write to FAM . Because the method “this” should always be declared with parentheses , we had to specify an extra parenthesis at the end of array element access to invoke “this” method.

Class wrapper {
  var fam_handle;
  var offset;
  type data_type;
  var value;

  proc this() { 
    // read data from FAM to value and return 
    return value;
  }

  ...
}
//Indexed element access
   Arr[100] = 100;  // write a value to element

   var x = Arr[50]();  // read the value 

We did go-through some of the discussion on these under #17999 and #21233. Summarizing some of the approaches that we felt feasible.

Approach1: Support parentheses-less this method

Provide capability in Chapel to define paran-less this method for the class objects. For this approach to work for our semantic issue, we may also need compiler to allow implicit conversion from wrapper class object to the array element type. We can achieve this though introducing user-defined coercion functions but may not work in all the cases where type of the target variable in the element read is not explicitly specified as explained in our previous mail.

 // Define coercion method
    proc WrapperType.coercible(type toType) param  { 
       return true;  } 

    var a: int = Wrapper; // "implicit conversion" to int?
    proc f(arg: int) { }
     f(myWrapper); // "implicit conversion" to int?

    var b= Wrapper;  // should a have type `wrapper` or `int`? ambiguity

Approach2: Support for custom reference types

Provide ability to create custom references to enable users to create user-defined reference type for the distant memory data, that are treated like typical chapel references. This avoids the use of wrappers in array methods and enables array interfaces to use these custom references wherever the data is accessed by reference. Since custom references are treated like typical Chapel references, it allows us to implement deref() methods. These deref() methods can be overloaded to read and write the distant memory based on how the access is made either as a l-value or r-value.

Our proposal is to include the required details in custom references to locate and access the respective data on distant memory. Also implement variants of deref() methods to achieve read and write operations based on how this reference type is accessed. Also any write to FAM through this reference should immediately propagate the updates to FAM.

e-kayrakli commented 1 year ago

Can you clarify your Approach 1 a bit more? To me it feels more about user-defined coercions than parenless this. i.e. if Arr[50] returned a wrapper type, then for var x: int = Arr[50]; to work all we need is the ability to coerce the wrapper type to int, isn't it? In other words, this shouldn't require .this to be called on the wrapper instance. But probably something like .coerce under the hood? If I am not mistaken that this doesn't require any change to this interface, it feels more appealing to me than Approach 2. Although I can follow the thought that led to Approach 2, it is more vague and a bit scary without fleshing it out some more.

amitha-c commented 12 months ago

With user-defined coercions defined , we will have the ability to coerce the wrapper type to any desired array element type. The reason for us to propose using wrapper.this is only to read the element value from FAM on every indexed read access. This is the way we are currently reading array element value from FAM in our prototype implementation.

But if there is any better way of handling FAM access through calling something like .coerce underneath, without the need of implementingwrapper.this as you suggested , that also sounds good.

bradcray commented 11 months ago

Focusing on option 1 for now:

First, a high-level question for @e-kayrakli: As you're working on GPU support, do you find yourself needing similar capabilities as this where the special memory flavors might require the ability to refer to a value or location that isn't sufficiently local to simply be represented as a ref or wide-ref as we do today? (or: do you anticipate us potentially needing such things in the future given architectural trends?)

Though I previously proposed the user-defined coercions (and I think the paren-less this method... possibly as a way of expressing a potential coercion?), I have one thing that worries me about it as I think about it more. Let's say that we have something like the following (where I'm just sketching):

record FamVarWrapper {
  type t;  // this record acts like a value of type 't' that happens to be stored on FAM
  var addr: FamAddr;  // this field is whatever it takes to refer to the location of the element on FAM

  // imagine this is the mechanism that will permit an instance of this record to coerce to a value of type t, though we might express it in other ways
  proc this { 
    var famval: t = FAM_Get(addr, t);  // fetch the value from FAM into a local temporary
    return famval;  // and return it
  }
  ...  // potentially some other stuff here
}

class FAMArray {
  type eltType;
  ...
  proc dsiAccess(inds: ...) {  // an access to a FAM array returns a wrapper object rather than a value of 'eltType'
    return new FamVarWrapper(eltType, computeAddrFromIndices(inds));
  }
}

This seems like it would help with cases that read the array, like:

var i: int = MyIntFamArray[i, j];
foo(MyIntFamArray[i, j]);

proc foo(x: int) { ... }

because the access of the FAM array will return the wrapper record and then the this (or whatever) tells the compiler that the wrapper record can be coerced to int and so stored into i or passed to foo(). We probably also want a way to opt into having this capability kick in for cases like:

var j = MyIntFamArray[i, j];

where we'd want j to end up being an int not a FamVarWrapper. (where I believe @benharsh has been thinking a bit about how to make this happen in the dyno world?)

And we could support assignments to the FAM array through the wrapper:

MyIntFamArray[i, j] = 42;

by using operator overloading along the lines of:

operator =(lhs: FamVarWrapper, rhs: FamVarWrapper.t) {
    FAM_Put(lhs.addr, rhs);  // store the rhs value into FAM memory
}  

But the case I'm worried about is the following:

proc bar(ref i: int) { ... }
bar(MyIntFamArray[i, j]);

since the coercion can only give us a new value, not a reference to the integer as this routine needs. So how do we support generalized references to values in FAM through the wrapper?

Taking a stab in the dark, if we had access to the source code for bar() still (i.e., weren't doing separate compilation), we could say that user-defined coercions in the presence of ref arguments might create a specialization of bar() that accepts the FamVarWrapper and specializes any operations within the routine w.r.t. it — for example, if the routine was:

proc bar(ref i: int) {
  i = 33;
  baz(i);
}

proc baz(ref i: int) { ... }

then maybe we could specialize it to:

proc bar(i: FamVarWrapper(int)) {
  i = 33;  // call the overload of assignment to FamVarWrapper
  baz(i);  // recursively specialize baz() in the same way
}

But I don't see how this could work in the presence of separate compilation where bar()'s definition wasn't available for specialization.

Another idea: We could consider calling bar() on a temporary integer and then store the result back into the FAM wrapper after the fact? LIke:

var tmp: int = MyIntFamArray[i, j];
bar(tmp);
MyIntFamArray[i, j] = tmp;

This seems like it would work in many cases, but not in ones in which the array element might be referred to by other means within bar() (like via an alias, or another direct access to MyIntFamArray, which just be considered another way of aliasing it). This isn't something we could guarantee today, but if we were to proceed with something like https://github.com/Cray/chapel-private/issues/5039 (for those who can't see it: an issue asking about permitting the implementation to assume arguments to a routine can't alias unless the user goes to extra pains to do so), maybe it's a possibility?

I know Michael had some other proposals on https://github.com/chapel-lang/chapel/issues/17999, but IIRC, I think they had similar challenges in terms of not having an obvious (to me, anyway) story about how to handle general references.

Proposal 2 is vague enough as written up here that, like Engin, it's hard for me to know what to say about it. Though it's interesting to me that, thematically, the ideas I'm kicking around here are a lot like creating a user-defined reference, simply one that the compiler need not know a lot about (if we were to introduce some notion of user-defined coercions).

e-kayrakli commented 11 months ago

First, a high-level question for @e-kayrakli: As you're working on GPU support, do you find yourself needing similar capabilities as this where the special memory flavors might require the ability to refer to a value or location that isn't sufficiently local to simply be represented as a ref or wide-ref as we do today? (or: do you anticipate us potentially needing such things in the future given architectural trends?)

Kind of. I've sometimes mused about the idea of a semi-wide-ref in the past. We use wide pointers today to differentiate between GPU and CPU-based allocations. However, I am imagining single-locale setups are going to be common with folks who want to use GPU support. In which case our wide pointers today are a bit "too wide". More than that, the heuristics for widening for GPU vs widening for across-the-network data is a bit different:

on here.gpus[0] {
  var Arr: [1..n] int;  // will allocate on device
  Arr[1] = 10; // will execute on host
}

For across-the-network widening, you don't really need to widen Arr. But for GPUs, you want to widen it so that you can do a PUT to the device memory from the host.

Right now, we are widening things a lot with the GPU locale model to be safe. But down the road, we want to tighten things up a bit (I think --gpu-specialization will help a lot when we turn it on by default). As part of that tightening, we'll have to think a bit more about when/what we want to widen. Part of the question could be "how wide?" if we imagine a new type of a wide pointer.

I am uncertain how much of that can be projected into FAM support. As far as I can understand, a FAM-wide pointer could be a typical wide pointer with some sort of sentinel value for locale ID (something negative?), and sublocale ID could represent if having multiple FAM nodes is a thing where there are NUMA effects accessing different parts of the FAM. But in any case, if that's somewhat acceptable, maybe we'll need a user-facing way to create a wide pointer, and runtime support to handle such pointers?