chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Rethink "write to array element" interface? #17999

Open bradcray opened 3 years ago

bradcray commented 3 years ago

In two separate contexts that I've been working with recently, I've been facing challenges with how we implement writes to array elements, for cases like this: A[i] = j;

Specifically, the current interface for writing to an array essentially accesses into the array and returns a reference to the array element that should be written. This works fine for traditional/simple arrays, but for some more complicated approaches that we might like to express, it raises challenges:

I'm looking for inspiration as to how we might be able to adjust our array machinery to support these cases, ideally without too much fuss for the array developer.

mppf commented 3 years ago

With our current interface, there's no hook for causing that "write back" to occur.

Can you show in an example program where you would want it to occur? I have for a long time been thinking that we should have user-facing data structures that are aware of the MCM. What that means to me specifically is that in a case like this, the data structure can cache something, but it hooks the read/write fence operations that we are doing anyway (conceptually always but in practice for cache-remote these actually turns into some kind of call). In that event, these cached values can be "written back" on the fence that causes them to need to be stored.

Besides those kind of approaches the other potential direction I can imagine that is very different from what we have today is to have some sort of "reference" record type that the array API can provide and return. But it would go beyond just ref in that maybe it has some operations that can occur when it goes out of scope. Maybe the compiler could convert setting this type into some sort of call on it. (In a way it arguably already does, with operator =). AFAIK the main issues with this approach in other languages are that it's hard to write and requires user-defined implicit conversions. But I imagine it is something we could make pretty smooth if we wanted to make it a language feature.

bradcray commented 3 years ago

Can you show in an example program where you would want it to occur?

A[i] = 42;

where A's domain map is:

I have for a long time been thinking that we should have user-facing data structures that are aware of the MCM.

I find that a really intriguing idea, particularly if the pattern can be made reasonably accessible / straightforward for a user to write and reason about. The one slight hesitation I have about it is whether it would be too lazy / passive for some cases that would want to use more of an ("eager") write-through approach rather than an ("eventual") write-back approach.

some sort of "reference" record type

This is more where my own thinking has been, where I'd been wondering whether we'd want to support a paren-less this method on such records to support implicit reads on the RHS while operator = could support writes when it's on the LHS. But I worry that only supporting writes through operator = is too limiting. So then I think "would we support a proc this ref version for l-value contexts?", but then I think we get back into the same problem as in the OP (?). And, like you said, this approach feels a little clunkier if we didn't do something more to streamline it.

mppf commented 3 years ago

To be clear, the MCM-aware user-facing data structure idea would combine with some sort of reference record type - where the data structure author, in implementing the reference record, would have the opportunity to implement immediate write-through or eventual/fence-driven write-back.

I might be too limited in my thinking but I think a combination of implicit conversion (for reads) and = (for writes) is sufficient for the reference record. Such features are a bit related to the problem of reducing GMP temporaries for things like a = b + c where these are all GMP numbers. Additionally I know we talked about "smart references" in the context of parallel safe list/map (but I am not finding the issue right now).

So anyway I wonder if a reasonable next step would be to sketch out some reference records and start to develop some language support ideas for them & see if we can get the job of implementing those not to be too hard.

bradcray commented 3 years ago

To be clear, the MCM-aware user-facing data structure idea would combine with some sort of reference record type - where the data structure author, in implementing the reference record, would have the opportunity to implement immediate write-through or eventual/fence-driven write-back.

I mostly understood this, but don't understand how write-through would work. I was imagining it would need to be lazy / eventual write-back on the assumption that the thing that would trigger the "write*" method to be called would be the compiler generating a memory fence/flush style operation according to the MCM model. So that suggested to me that I couldn't write the type to do it eagerly, but would be at the mercy of when the compiler believed my write should/could occur?

I might be too limited in my thinking but I think a combination of implicit conversion (for reads) and = (for writes) is sufficient for the reference record.

Oh, I think you're probably right about the implicit conversion for reads, that's a good point. I hadn't thought of that, though you've probably suggested it before in similar conversations.

w.r.t. = for writes, I've been vaguely worried that relying on it wouldn't be general enough to handle all cases, but let me think about why that is... I think the kind of case I'm worrying about is like this:

proc foo(x: int) {
  x = 42;
}
foo(A[i]);

where, if A[i] returned some sort of special record type, that record type itself can't be passed into foo(), the = in foo that tries to write to the array element would be trying to call the vanilla int = rather than the one on the record, and implicit conversion presumably wouldn't help things (since it's unlikely to be legal in ref situations). So am I missing something in thinking = is insufficient, or are you?

FWIW, I've started sketching out some of these patterns using records within the contexts of arrays, but have been running into problems getting it threaded through all of the array machinery in the internal modules and compiler. The current barrier being that resolveAutoCopies() was complaining that it couldn't initialize a value of type 't' from my record type 'R(t)' which is why I was asking about it on chat last week.

mppf commented 3 years ago

I mostly understood this, but don't understand how write-through would work.

I'm imagining that the record author can implement operator = on the reference record to either to save some writes (for eventual flush on a fence) or to do them immediately.

where, if A[i] returned some sort of special record type, that record type itself can't be passed into foo(), the = in foo that tries to write to the array element would be trying to call the vanilla int = ra

Well, we could imagine replacing the ref that the compiler uses today with a class hierarchy of referency-things. Then in your example proc foo(ref x: int) x = 42 could call something with virtual dispatch. That could be horrible for performance though. So maybe ref x: int is actually a kind of generic type and the function gets stamped out with the appropriate type of reference?

bradcray commented 3 years ago

@mppf: Thanks for chewing on this with me.

I'm imagining that the record author can implement operator = on the reference record to either to save some writes (for eventual flush on a fence) or to do them immediately.

Are you imagining that this choice between write-through or write-back would be opted into via some sort of high-level bit or pragma or annotation on the = routine which might be written equally simply either way? Or that the author of the = routine would explicitly create code that would save the values somewhere in that case?

Meanwhile, this idea:

So maybe ref x: int is actually a kind of generic type and the function gets stamped out with the appropriate type of reference?

is feeling a bit like the "implicit conversion in ref contexts" notion that I was brushing up against in my previous comments and sensing felt a bit too weird. Or at least, fairly different from how Chapel works today. I wonder whether there's a precedent for it in other languages... Where I guess the difference between what I said and what you are thinking was that I was imagining somehow squinting at the record to make it an int that was ref-able, whereas you seem to be thinking more in terms of specializing the function to make it take things that can pose as ref-to-ints?

mppf commented 3 years ago

Are you imagining that this choice between write-through or write-back would be opted into via some sort of high-level bit or pragma or annotation on the = routine which might be written equally simply either way? Or that the author of the = routine would explicitly create code that would save the values somewhere in that case?

That they'd explicitly create code to save the value somewhere, for write-back; and that they'd start the write immediately, for write-through.

Regarding ref and implicit conversions... I think the "normal" way to handle it is with implicit conversions and = overloads; but here I'm probably just describing how you'd do it in C++. I know Rust has some sort of "smart reference" type but I don't know if they have more support for conversions.

Revisiting the example though:

proc foo(ref x: int) {
  x = 42;
}
foo(A[i]);
// Q: What can A[i] return to make this work?

I wonder if it would be acceptable to lean on other language features (namely, return intent overloading and out intent) to handle this. If we wrote this:

proc foo(out x: int) {
  x = 42;
}
foo(getAElt(i));

proc getAElt(i: int) ref {
  // return some sort of type that exists only to be written to with `=`
  return new writeHandler(A, i);
}
proc getAElt(i: int) const ref {
  // do the regular thing we do today
  return A[i];
}

What seems good about this:

What isn't working:

bradcray commented 2 years ago

As an update on this: One of the groups who was trying to do this has gotten a prototype working that is promising, but not quite as seamless as ideal. The approach taken was essentially:

The main downside of this approach so far is that when the array access is on a RHS, like var x = A[i];, the class is also returned, but isn't of type eltType as expected. So the workaround has been to:

This is sufficient for experimentation, but obviously not for the long-term. It also causes problems for composition. For example, other array operations that call dsiAccess() or read the array need the extra parenthesis as well (but only for the cases where the array is implemented by this type.

Briefly, I was thinking "Oh, but we could have the _array.this() overloads that read return eltType rather than a class, but then I remembered that, currently, both reads and writes are going through the same _array.this() overloads due to our need to disable the ref-based overloads. So my main two thoughts here are:

Of these two, the second seems attractive in terms of its orthogonality with paren-ful this methods, but more than a little problematic since it's so syntactically invisible (when would a naming such an object not result in a read? How would other things be done with the object? [edit: Or are these non-issues? Is the idea that any non-ref access to the class would resolve to such a read? I feel this would benefit from more thought]).

The first seems attractive conceptually, but challenging to know how to express. On the other hand, if we did come up with a way of expressing it, it might eliminate what is currently a subtle and fairly unique case (return intent overloading).

Or, maybe there's some other cool solution that I haven't thought of yet.

mppf commented 2 years ago

Coming back to one of the original challenges:

in one, the array is composed of two distinct memories: a "main" memory that represents the array elements but that cannot be directly indexed/accessed; and a "local" memory that serves as a cache for the array memory and can be written/indexed, but then has to be copied back to the main copy to be considered complete. With our current interface, there's no hook for causing that "write back" to occur.

Here, we can return a ref to an array element in the local memory. The problem is that we need to know when the write to that data has been completed. With the idea of returning a custom type instead of a ref eltType, I think the main thing we are achieving is that the deinit on that custom type is a place we can mark the operation complete.

So, this gets me thinking that maybe we could keep the ref return for the type system but also add another return that the compiler ignores?

for example

proc getRefToElt(i) ref : eltType { ... }
proc getArrayElt(i, ref _realRef=getRefToElt(i), _scope=new endOfRef(_realRef)) ref : eltType {
  return _realRef;
}
proc getArrayElt(i) const ref : eltType {
  ...
}

proc userCode() {
  ref x = getArrayElt(1);
  // creates temporary _scope to be deinited at end of block
  x = 12;
  // now _scope is deinited and finalization can occur

  getArrayElt(2) = 24; // temporary _scope is deinited after this line
}

I think the main thing I am worried about with the above approach is that we wouldn't be able to write something returning a ref:

proc getRef() ref {
  return getArrayElt(1); // uh-oh, _scope will be deinited here
}

So I guess that means that we can't really split the scoping from the ref -- we have to operate on the "scope" type and somehow treat it like a ref.

Separately, one thing I find a bit unsatisfying about the return intent overloading is that it requires that the collection default-init a new element in order to return it by ref. In contrast it seems more appealing to me to return some kind of thing like ref but that includes a promise that the value will be initialized (that is checked by the compiler). Then, we can skip the default initialization. We could image representing this uninited-ref with a record and that record could have a field with a helper to run once the value is initialized.

Anyway, regardless of the uninited-ref idea, my own opinion is that we need some kind of way to make a custom ref. The compiler would need to treat it as a ref in the usual way but the implementation could be user defined (as a record, most likely).

Continuing along this line, I think the main choice ahead of us is, for a reference to int, is there going to be just one type for that within the compiler, or will there be multiple? If there are multiple, I think we'd have to make a lot of things generic that aren't today. If there is one, I think we would need the "hooks" available within a custom reference type to be implemented with something like virtual dispatch, which might have performance implications, since we use things like ref int all over the place and it is probably not a good idea to add a conditional to each one. Maybe we could have just 2 reference types -- custom one (with hooks) and the standard one. (Or maybe 3-4, if "uninited-ref" is a thing we pursue).

mppf commented 2 years ago

come up with some other means of differentiating between read/write overloads other than relying on the presence/absence of a ref overload, so that such arrays could still have distinct read/write code paths?

This got me thinking about an alternative to return intent overloading to achieve the same goal (at least, for arrays).

I call the idea "compound operators" and I don't remember if it's come up before or not but I'm pretty excited about the possibility.

We want to be able to differentiate patterns like this

A[i] = f(args);

from patterns like this

g(A[j]);

What if we could express A[i] = as its own operator?

Here I am imagining that the array implementation for A could provide:

// this is a "compound operator"
// it matches code that *syntactically* has the form A[i] = rhs
// by defining a function body for the combination of proc this and operator =
operator myarray.this(i: int).=(rhs) {
  // a "for instance" implementation doing something nontrivial
  lockElt(i);
  getOrCreateEltRef(i) = rhs; // assign element
  unlockElt(i);
  // could do "write back" here, too
}

// this is the regular access operator that is used if the above doesn't match
proc myarray.this(i: int) {
  haltIfBoundsCheckFails(i);
  return getEltRef(i);
}

(As an aside, I'm tempted to also allow operator myarray.this(i: int).=(in rhs) and call that one if the RHS is already a temporary, to allow a data structure author to be able to handle A[i] = f() adding a new element with move-initialization only).

What I think is interesting about this idea is that it seems general enough to help with a number of other issues:

What are some issues with this approach?

Despite these issues, I think the idea could go somewhere useful.

aconsroe-hpe commented 2 years ago

What if we could express A[i] = as its own operator?

This makes me think of Python's __setitem__ which is a function of (self, key, value) where key could be (it can be anything, but in the case of the arrays, the interesting ones are):

and value could be an int|slice|array

You do miss out on the ability to take a ref and pass it around. But I think you could manage with something like

var view = new view(arr, 10:20, 3);
foo(view);
proc foo(arr) {
  var x = arr[0];
  arr[10] = x * x;
}

The caching discussion makes me think of aggregators and managers, where you might imagine something like

manage writeBack(arr, bufsize=16) as carr {
  carr[10] = 20;
  // reading arr[10] would give the old value
  carr[20] = 30;
} // end of scope flushes writes

// replicator
manage replicateWrites(arr1, arr2) as rarr {
  rarr[10:20] = 20 // writes to arr1 and arr2
}

The rewrites to avoid temporaries in a = b + c is very interesting and makes me think of Haskell's rewrite rules.

mppf commented 2 years ago

The rewrites to avoid temporaries in a = b + c is very interesting and makes me think of Haskell's rewrite rules.

I looked at that page but I'm not quite following what is happening there. Is there compile-time evaluation going on? Or just AST transformation?

There also was a lightning talk in LLVM-HPC 2021 about this -- by Alister Johnson "Automatic and Customizable Code Rewriting and Refactoring with Clang". They showed this example which specifies how to rewrite a CUDA kernel launch as a HIP one:

[[clang::matcher("kernel_call")]]
auto cuda_kernel(int numBlocks, int numThreads) {
    auto arg1, arg2;
    {
        kernel<<<numBlocks, numThreads>>>(arg1, arg2);
    }
}

[[clang::replace("kernel_call")]]
auto hip_kernel(int numBlocks, int numThreads) {
    auto arg1, arg2;
    {
        hipLaunchKernelGGC(kernel, numBlocks, numThreads, 0, 0, arg1, arg2);
    }
}

What I found exciting about this talk is that it's a pretty simple interface for a user to write some AST match & transform. (The clang::matcher part specifies the pattern to search for and the clang::replace part indicates what to replace it with).

We could consider using a very general strategy along either these lines for the specific problem in this issue. I think the main challenge to doing something extremely general is that it can interfere with a programmer's ability to understand their code. At least with the combinations-of-operators ideas, the behavior of the code at the call site is still (arguably) understandable (without knowing all of the details of all of the implemented combinations). In contrast, with a general rewriting strategy, we'd have to be more agressive about saying something like "Use this feature very carefully or users of your library will be confused".

mppf commented 2 years ago

Off-issue, @bradcray was asking about whether some of the solutions here could help with optimizing operations on bigints (where the main issue here is wanting to avoid extra allocations - i.e., reuse existing bigints for the results).

How would the "compound operators" idea described above in https://github.com/chapel-lang/chapel/issues/17999#issuecomment-976579379 handle this?

The BigInteger module would have some sort of proc/operator/something that is a single call that the compiler will make for b1 = invert(b2, b3). For example it might have operator bigint.=(b1).invert(b2,b3). Similarly, for b1 = b2*b3, the compiler could call operator bigint.=(b1).*(b2,b3).

Note that we already effectively have both the compound version and the non-compound version of the operators like * (mul is the compound version and * is the non-compound version). If we went this way with BigInt we would probably want to implement both forms for all of the methods. I suppose we could have the compiler be willing to create a temporary and call a compound operator to implement a non-compound one, but IMO that reduces the clarity of the compound operators language idea.

aconsroe-hpe commented 2 years ago

I wonder if we can do this today with assignments participating in overloads so that

x = f(a, b);
// looks for overload
f(a, b, out x); // maybe this needs to be out or ref or ref out?

A[i] = b;
// looks for overload
operator=(b, ref A, i);
mppf commented 2 years ago

Right but what bugs me about that is that it's asymmetric. In the first case, we have a special f, but in the second case, we have a special operator=. But both are doing some combination, of = and something else. In particular, why would operator= necessarily have a 3rd argument that means element access? What if I wrote f(i) = b - isn't that just as (syntactically) reasonable to want? The compound operators idea handles these things symmetrically and in a general way -- f(i) = b could call a operator someType.f(i).=(b1).

aconsroe-hpe commented 2 years ago

In particular, why would operator= necessarily have a 3rd argument that means element access?

I'm imagining it would just be one possible overload and it wouldn't "mean" anything in particular, that is up to the definition. I would think we'd also need A[i, j] = b to goto operator=(b, ref A, i, j (or in tuple form maybe)

What if I wrote f(i) = b - isn't that just as (syntactically) reasonable to want?

Good question and I think in what I've proposed, that would only look for an overload if f is not a function. That rule breaks down with methods so I'm still thinking about what A.foo(i) = b; would do.

EDIT: extra thought is that in the bigint example, the function is already written like invert(a, b, out/ref c) so you could save the step of also having to implement the chained operator thing.

dlongnecke-cray commented 2 years ago

TLDR: While smart references seem like a valuable idea, I think they probably aren't the most natural solution to this problem. Since most of the fine details here concern the collection, we should probably just let the collection itself sort things out with a "compound expression" operator instead of delegating to a smart reference and creating a dependency.

I've read some of the backlog here and it seems like thoughts have gone in one of two directions:

I have been thinking about this w.r.t. context managers and aggregation (apparently I should have paid more attention to this thread...). I've entertained the idea of something like:

record smartRef {
  // What information goes in here to make this useable? Is it data-structure dependent? E.g...
  // Need to know the key to commit the write later? Same for value?
  type K, V;
  type M;
  var ptrSlot: c_ptr(M.slotType); // Get a pointer to the actual slot in the map...or some sort of aggregation buffer.
  // We store the key and value here rather than in ourselves? Implementation detail...

  // This is pretty much the exact same setup as what Brad described above a few months ago.
  // Use 'this' for reads, but it's clunky...
  operator this(...) {}

  // Use 'operator=' for "writes"...
  operator =(lhs, rhs) {
     // Communicate with the map to do some sort of aggregation...
  }

   // But now how do I let this reference behave like plain-old-references? I tried forwarding, no luck.
   forwarding v;
}

// Q: can a smart reference make this be both task-safe and handle aggregation?
manage myMap.guard(k) as v do v = 1;

This is pretty much what Brad described in: https://github.com/chapel-lang/chapel/issues/17999#issuecomment-974488681

This code is all over the place, but basically I wanted to try and spruce up the manage foo.guard() pattern (which was originally intended to lock on a value and return a reference) so that in addition to making it task-safe, it could also do aggregation as well. I thought that "smart-refs" could be a way to indirectly aggregate writes to a reference. After wrestling with this, my conclusions were:

While I still think smart references could be a useful tool and it would be nice for the language to be able to support them, I think having a sort of "compound expression" e.g. operator this.=(i: int, ref rhs) as Michael illustrated above, would probably allow us to solve this problem in a more simple, natural way.

manage myMap.aggregate(style, bufsize, moreConfigStuff) do
  forall i in 0..hi do
    myMap[i] += 1;

Additionally we can still introduce managers to allow a collection to adopt more coarse-grained locking, and can combine that with aggregation as well.

mppf commented 2 years ago

One thing occurred to me about the compound operators idea.

Suppose you want to do something like this to your array element:

A[i] = f(A[i]);

Today we can write this like so:

ref elt = A[i];
elt = f(elt);

to do the array access only once. That might be important for some atomicity reason. (E.g. if the elements are atomic values, we can imagine something like elt.add(1) instead of this pattern). But, this pattern does not work with the compound operators idea (when you return a reference to the element, it basically falls out of what compound operators can support).

So, that is where I would imagine an update method or a context manager.

The update method is something like this:

A.update(i, lambda (ref elt) {elt = f(elt);});
// Or, if we combine the update with the compound operator, we could write
A[i].update(lambda (ref elt) {elt = f(elt);});

The context manager way of doing the same thing looks like this:

manage A.element(i) as elt {
  elt = f(elt);
}

@dlongnecke-cray - I am curious, can a context manager that returns v by value in this case access the value of v at the end?

Anyway, I think that the context manager could return a regular reference to elt in some local aggregation buffer; the part where that gets challenging is that if elt already existed in the array we would expect to have a reference to it.

But, one of the main points of the context manager or update function here is that the data structure knows when the process of updating the element is done, so that it can know when to do the "write back" (from the issue's original post).

So anyway that leads me towards thinking:

I think an interesting thought experiment is to think about how we might be able to extend the compound operator idea to unpredictable combinations.

dlongnecke-cray commented 2 years ago

@mppf The managed block can access v until the end of the scope created for the do statement, but not after. So a loop wouldn't be able to access it as the "last statement" (if that's where you're going with this?).

Thanks for pointing out that we still need .element() / .guard() to enable a single atomic transaction, I think you're right that the compound operator couldn't handle nested expressions on the RHS.

I still do like compound operator idea for the simple cases, though. I am an advocate for having as much of the locking and aggregation details in the data structure as possible, which is why I was trying to push back a little against the smart reference idea (because it creates this complex relationship between the data structure and another type).

Anyway, I think that the context manager could return a regular reference to elt in some local aggregation buffer; the part where that gets challenging is that if elt already existed in the array we would expect to have a reference to it.

Actually, I'm wondering if this matters in a world where there are no other existing references to elt and accesses to elt go through the same lock. For a parallel data structure I would expect that

// Here we lock on 'k(ey)', only one task sees `ref v` at any time
manage m.element(k) as ref v do v += 1;

If m.element() is the only way to get a reference to m[k], then v is the only reference to m[k]. Could this help the situation at all?

I think an interesting thought experiment is to think about how we might be able to extend the compound operator idea to unpredictable combinations.

The only thing I can think of right now is some way of wrapping the RHS expression evaluation into some sort of critical section (like an implicit atomic block around the evaluation of the actual) owned by m, or maybe folding its evaluation into the scope of the compound operator. Both of which seem kind of crazy and are implicitly doing what .element() is trying to make explicit.

mppf commented 2 years ago

@mppf The managed block can access v until the end of the scope created for the do statement, but not after. So a loop wouldn't be able to access it as the "last statement" (if that's where you're going with this?).

It would indeed be nice if the unordered-forall optimization / automatic aggregation optimization could apply. But I imagine that these would need to be communicating with the data structure (as I would imagine that the data structure needs to own and manage the aggregation buffers, etc). As a result, maybe it would take a different form in the implemantion. For example, the compiler might add an argument to the compound operator calls / the enter&leaveThis calls (or something?) to indicate that the loop pattern indicates that the order of these operations don't matter.

But anyway the alternative is for the compiler to create the code to be done in an unordered/aggregated way (which is more what #12306 gets to). But to my tastes at the moment, I think it would be better for the data structure to be involved, but benefit from the conclusions of the compiler's analysis. (Basically because the data structure knows more about what is going on than the compiler does).

I'm not super confident that either approach is the best/right one, though.

mppf commented 2 years ago

I am idly wondering if we would be able to remove the return-intent-overloading feature entirely, if we had some success solving this issue in a different way.

mppf commented 2 years ago

This is kindof re-hashing things already discussed above but I wanted to post a comment summarizing how I currently feel about the compound operators idea.

I am still pretty optimistic about the compound operators idea described above. I think the main drawback of it is that it's not inherently clear where to draw the line between combinations that are handled normally through multiple calls and combinations that are a compound operator. That comes up in a few places: 1) understanding some code calling array (say) operators/functions/methods 2) when implementing an array type or some more general data structure, which things need to be supported as compound operators.

Due to those limitations, I don't think the compound operators idea can by itself address the histogram race condition (see issues #19102 and #8339).

However, I still think that the compound operators idea can be very valuable to us. We just have to decide where to draw the line, so to speak, in terms of which things are implemented as compound operations for array types / more general collections.

So, my straw proposal is