Support copy-elision and split-init across `on this` blocks

dlongnecke-cray commented 4 years ago

As a Chapel user, I want the compiler to support copy-elision for in intent arguments and split-init for variables across on this blocks when it is safe/reasonable to do so.

copy elision

While working on #15772 I wrote a test to ensure that copy-elision was taking place correctly and that exactly one copy of each argument was produced per call to set.add(). What I found was that there was an extra copy produced due to an on this statement in the body of set.add().

Here's a short example for when copy-elision across on this blocks would be useful:

record r { var x: int = 0; }
proc r.deinit() { writeln("deinit"); }

record mySet {
  type T;

  // Whatever backing store you are using for your container, e.g. _ddata...
  pragma "no init"
  var _store: T;

  proc _addElem(in elem: T) {
    // Add the element to the table (ideally move it into place).
    // The table's add method will probably have the `in` intent as well,
    // which makes it a candidate for copy elision.
    _store = elem;
  }

  proc add(in elem: T) {
    // The compiler can't move `elem` across this boundary,
    // so it has to make a copy.
    on this {
      _addElem(elem);
    }
  }
}

proc test() {
  var x = new r(128);
  var s = new mySet(r);
  s.add(x);
}
test();

This will output:

deinit
deinit
deinit

The first copy is made due to the in intent on mySet.add. The second copy is made when passing elem as an argument to _addElem. Because the call to _addElem exists within an on this block, a temporary is copy initialized from elem within the block, even though it would be a candidate for copy elision otherwise.

What's frustrating is that this copy is made even on programs compiled for COMM=none - however I understand that this might be to prevent unexpected behavior when recompiling a single-locale program for multi-locale.

It is important that only one copy of an element is made when adding it to a container, both for the sake of efficiency and to avoid visible side effects that may occur when a copy-initializer or deinitializer is called.

For me, the solution was to remove the on this block from .add, but that is frustrating because it can trigger a lot of remote communication if .add is executing on a different locale from this. In addition, there may be additional latency if code tries to non-locally take locks.

split-init

Here's an example of when split-init across on this blocks would be useful:

record r { var x: int = 0; }

record myBackingStore {
  type T;
  var _data = new T();
  inline proc size { return 1; }
  inline proc data ref { return _data; }
}

record mySet {
  type T;

  // Whatever backing store you are using for your container, e.g. _ddata...
  var _store = new myBackingStore(T);

  inline proc _enter() {} // Lock this set.
  inline proc _leave() {} // Unlock the set.

  proc toArray() {

    // We have to take these locks outside the on block because we need
    // to get `.size` from `_store`. If we supported split-init across
    // on blocks, then we could take these locks within the `on this`.
    _enter(); defer _leave();

    // Initialize our result outside the on block.
    var result: [0..#_store.size] T;

    on this {
      // If we supported split-init across on blocks, then we could just use
      // an initialization expression for `result` instead of making a
      // temporary here.
      var tmp: [0..#_store.size] T;

      // Copy over the element(s) of the store to the array...
      tmp[0] = _store.data;

      // Copy temp array outside of the on block.
      result = tmp;
    }

    return result;
  }

}

proc test() {
  var s = new mySet(r);
  var arr = s.toArray();
  writeln(arr);
}
test();

Because you cannot return from on blocks, it is necessary to declare the result value outside of the block and then declare a temporary from within the on block. After doing real work on this, the temporary is copied back out (potentially to a different locale) to result.

However if your container has to take locks (as is done in this example), it must take them outside of the on block in order to i.e. safely read the size of the store. This allows for code that takes the lock non-locally, resulting in communication and latency when taking what might be a simple local spinlock.

If split-init was supported across on this blocks, -OR- if one could return values from within on blocks, it would eliminate the need for a temporary and allow us to take the locks on the container locally.

mppf commented 4 years ago

This issue is related to #15676.

mppf commented 4 years ago

One piece of important context here is that in #15739 I changed the hashtable implementation to rely on split-init and copy-elision features rather than using pragmas. @dlongnecke-cray attempted the same thing in the set implementation and was successful in doing so - with the exception of this issue.

mppf commented 4 years ago

Note that if we did this it would be a breaking language change.

Edit - well, for split-init, we could consider code like this:

var x;
on something {
 x = ...;
}

To be opting in to split-init across an on statement. But changing the behavior of this

var x: SomeType;
on something {
 x = ...;
}

would be a breaking change.

mppf commented 4 years ago

  proc add(in elem: T) {
    // The compiler can't move `elem` across this boundary,
    // so it has to make a copy.
    on this {
      _addElem(elem);
    }
  }

This seems like something that we could just have a function that does this and opts in to doing it across locales. For example:

  proc add(in elem: T) {
    // The compiler can't move `elem` across this boundary,
    // so it has to make a copy.
    on this {
      _addElem(moveAcrossLocales(elem));
    }
  }

e-kayrakli commented 3 years ago

On gitter @hokiegeek2 expressed wanting to have something like

 on Locales[idx+1] {
     leadingSlice = localeArray[localStart..localLeadingSliceIndex];  
     //write to data structure that I can retrieve later in another code block
}

//later in some code block...

 on Locales[idx+1] {
     //retrive leadingSlice and do something with it
}

and my mind went to split-init, but it didn't work. @mppf pointed me to this issue. I didn't think much about the copy elision part to express an opinion but split-init across on statements would be nice.

FWIW, I tried it with snippet:

var myArr = [0,1,2,3,4,5];

ref mySlice;
on Locales[1] {
  mySlice = myArr[3..5];
}

on Locales[1] {
  writeln(mySlice);
}

bradcray commented 3 years ago

@e-kayrakli: I think we were trying the same thing in parallel. :)

I agree that this seems attractive; the one part about it that I think might not be sufficient (or, at least, ideal) for @hokiegeek's question and code is that it seems that the ideal would be for mySlice to live on Locales[1] rather than simply being initialized and accessible from locale 1 (where, if it's a reference, maybe it's not a super-big deal, but my impression was that he wanted to do a deep-copy snapshot of the slice, which would be more of a big deal). That leads my mind to the intended-from-day-1-but-never-implemented feature of declaring remote variables:

on Locales[1] var x: int;   // x is within this scope, but not on this locale

and using them with split init, like you were saying:

on Locales[1] var mySlice;
on Locales[1] {
  mySlice = myArr[3..5];
}
...
on Locales[1] {
  writeln(mySlice);
}

mppf commented 3 years ago

I'm noting here that PR #18711 is adding a workaround for problems encountered with the pragma "no init" type workaround for this problem. Ideally we could express it with calls to functions in Memory.Initialization but that is not possible today.

Lastly, for the recent discussion about using on SomeLocal var x: someType to allow split init of that within an on SomeLocal -- I think this does not address @dlongnecke-cray's original problems. I don't think it was intended to address the copy-elision case (with proc add). Is it expected that it could address the split-init case for proc toArray?

I suppose it might look like this?

 proc toArray() {

    // We have to take these locks outside the on block because we need
    // to get `.size` from `_store`. If we supported split-init across
    // on blocks, then we could take these locks within the `on this`.
    _enter(); defer _leave();

    // Initialize our result outside the on block.
    on this var result: [0..#_store.size] T;

    on this {
      // split init
      result = _store.data;
    }

    return result;
  }

Also, about the idea more broadly. Would the compiler need to prove that the on _ var locale matched the on _ { locale? Isn't that a runtime-only piece of information?

For these reasons I am leaning towards either: a) deciding that it's OK to do split init and copy elision across on statements (but with some way for types to opt in to making adjustment when that happens per #15676); or b) allowing the user to opt-in to split init and copy elision across an on statement by calling some special routines in Memory.Initialization that indicate to the compiler it is OK.

mppf commented 2 years ago

Following an idea in https://github.com/chapel-lang/chapel/issues/19617#issuecomment-1104555638 -- we could use with (in a) or with (out b) to enable copy elision or split init, respectively. This is nice because it is not a breaking language change (because it is something you have to opt in to - and intents for on don't exist yet).

An interesting question is whether or not this would apply also to coforall ... with (in a) (or out) and there it cannot because we would expect there to be multiple tasks. Maybe it could apply to cobegin though with special rules (but that is the subject of #19617).

Copy Elision

  proc add(in elem: T) {
    on this with (in elem) {
      _addElem(elem);
    }
  }

Split Init

 proc toArray() {
    var result;
    on this with (out result) {
      _enter(); defer _leave();

      result = localDataToArray(_store.data);
    }

    return result;
  }

jhh67 commented 2 years ago

I ran into this same issue in channel.readBytesOrString, causing an extra copy of data that are read from the channel. It would be great to have this fix and eliminate the extra copy.

chapel-lang / chapel