chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 419 forks source link

syntax to return slice referencing global array elements #12178

Open mppf opened 5 years ago

mppf commented 5 years ago

This issue is pulled out of "What should the semantics be when returning an ArrayView?" #5341. (There is a lot of discussion and motivation on that issue - this issue exists to summarize the current proposal succinctly).

Currently, this program

var A: [1..10] real;
proc returnsArray() {
  return A[1..4];
}

returns a copy of the array slice (and so returns a non-view DefaultRectangular array). The question is - what could a user do if they didn't want that copy?

The proposal here is that ref return intent for an array slice means "return array elements by ref".

Specifically, this program would be in error:

proc returnLocalSliceByRef() ref {
  var A: [1..10] real;
  return A[1..3];
}

since it conceptually returns an array with elements that have already been destroyed.

The following program would be legal and would return an array slice referring to the same elements as the global variable A:

var A: [1..10] real;
proc returnNonlocalSliceByRef() ref {
  return A[1..3];
}
mppf commented 5 years ago

Tests to update if this is addressed somehow:

bradcray commented 5 years ago

Grepping for pragma "no copy return" is another way to find tests that run into this issue (I believe by doing so I found some not in Michael's list above). I think this issue could likely be considered for general returns of array expressions even without slicing (though slicing is the most common case where this seems to come up in practice). For example:

var B1, B2: [1..100] real;

proc getNextBuffer(it: int) ref {
  if it % 2 == 0 then
    return B1;
  else
    return B2;
}

(of course, there are other ways to write this pattern without returning an array by ref, but it seems as though someone ought to be able to write patterns like this).

mppf commented 5 years ago

@bradcray - I believe the last example you wrote works today (once you rename the iter variable).

Another thing to think about, besides slices, is if users should be able to opt-in to returning an iterator record / iterator expression rather than turning it into an array.

bradcray commented 5 years ago

Oh, OK. I was thinking we didn't have a way to return an array by ref today short of using a pragma. I must've been misremembering (I've confirmed that it works).

should [users] be able to opt-in to returning an iterator record / iterator expression rather than turning it into an array.

I definitely think so. By mentioning it on this issue, are you suggesting that you think ref might be a reasonable way to do so?

mppf commented 5 years ago

By mentioning it on this issue, are you suggesting that you think ref might be a reasonable way to do so?

That's unclear to me, but I think it's worth considering at least.

vasslitvinov commented 2 years ago

One tricky case with this proposal is when returning a pre-existing slice by ref, for example:

var A: [1..10] real;
ref slice1 = A[1..3];
ref slice2 = A[2..4];
proc pickSlice(arg) ref return if arg then slice1 else slice2;

Should pickSlice() return a shallow copy of the chosen slice or a ref to it? The considerations are:

Looking through the discussion in #5341, I like the motivation that ref intent -- essentially this proposal -- is what users resort to intuitively (Nikhil, Tom).

I disagree with the reasoning that the return intent is what we want applied to array elements. This may or may not be intuitive, however this is inconsistent with what we do for the default argument intent, for example for an array of reals.

mppf commented 2 years ago

Another case that is bugging me with the "ref means array elements by ref" strategy is that I don't see how to generalize it to having an initializer that initializes a domain field to sometimes alias another domain and sometimes be a new domain (which is coming up in the context of Arkouda - more details on that specific situation to come).

bradcray commented 2 years ago

Should pickSlice() return a shallow copy of the chosen slice or a ref to it?

I would say a shallow copy of the chosen slice. For comparison's sake, when the user writes:

var A: [1..10] real;
var B: [1..10] real;
proc pickArray(arg) ref return if arg then A else B;

I'd similarly expect us to return not a ref to the array record but a new descriptor that refers to the _value classes of A or B (?).

will be creating lots of slices instead of just manipulating pointers. This situation is unlikely, however should be considered.

I expect it not to be creating a lot of slices so much as creating records with class fields that alias existing classes. So small struct manipulation rather than the equivalent of running A[2..9], say.

This may or may not be intuitive, however this is inconsistent with what we do for the default argument intent, for example for an array of reals.

Can you say more about what feels inconsistent to you? I feel as though we've traditionally described an array's value as being "the values of all of its elements", such that passing an array by ref means that you're aliasing all of the original elements. And I think this is what the default argument intent does from a user's perspective, regardless of how it's implemented. But you may be seeing something that I'm missing.

that I don't see how to generalize it to having an initializer that initializes a domain field to sometimes alias another domain and sometimes be a new domain (which is coming up in the context of Arkouda - more details on that specific situation to come).

I'm not understanding this case either, so will look out for the further details. It sounds like a case for ref fields to me, and/or possibly ref(?) fields that can switch between ref and var based on a param value.

vasslitvinov commented 2 years ago

Brad - good points. In your example:

var A: [1..10] real;
var B: [1..10] real;
proc pickArray(arg) ref return if arg then A else B;

today pickArray() returns a straight address of the _array record of either A or B. This perhaps reveals our bias towards viewing an array as an object, rather than just a collection of elements. (Or maybe it doesn't.)

a program where the above pickSlice() is invoked in a tight loop will be creating lots of slices instead of just manipulating pointers

I agree pickSlice() would not calculate a slice from scratch the way the expression A[2..9] would. However, it would create an array (with an ArrayView domain map). I expect that array creation will increment the array counter in the respective domain. This is what makes me concerned about performance impact.

I disagree with the reasoning that the return intent is what we want applied to array elements. This may or may not be intuitive, however this is inconsistent with what we do for the default argument intent, for example for an array of reals.

When passing an array of reals to a function by the default intent, the rule "the argument intent is applied to array elements" would entail passing each element by const in, i.e., copying the array. The current rule instead passes the array by reference. To be fair, I do not consider the reasoning Brad states and the inconsistency I see to be major players in choosing the solution.

vasslitvinov commented 2 years ago

Making my thinking explicit, I am generalizing this issue to defining new syntax that, together with existing features such as existing value and ref return intents, will enable users to choose among the following semantics:

Furthermore, users will be able to make these choices:

Ideally both "by value" semantics will be available simultaneously, that is in different branches of a single conditional or in different return statements of a single function. I doubt the virtue of mixing by-value and by-reference semantics in these "simultaneous" situations because I do not see how the recipient of such a mix, ex. the caller of such a function, can handle it.

bradcray commented 2 years ago

I expect that array creation will increment the array counter in the respective domain. This is what makes me concerned about performance impact.

For your pickArray() case, that could be optimized away by seeing that the arrays are at module scope and don't need to be reference counted...

When passing an array of reals to a function by the default intent, the rule "the argument intent is applied to array elements" would entail passing each element by const in, i.e., copying the array. The current rule instead passes the array by reference.

I'm still not following... what rule are you citing here?

return an array slice as ... a new slice by value, or as an existing slice by ref return a domain ... as a domain (also by value) that aliases another domain, or as a reference to an existing domain

How would a user distinguish between these cases? What would motivate them to do so?

To me, this general direction seems like it adds a lot of complexity that hasn't been requested and doesn't seem clearly motivated...

vasslitvinov commented 2 years ago

The difference between returning a slice or a domain by value vs. by reference lies in what the user can do with them. If the user has a reference, they need to worry about its lifetime. If the user has a value, they can store it in a field, return it out by value, ...

Somehow I feel uneasy about taking away the ability for the user to return a slice (or domain) by reference, as this proposal suggests.

mppf commented 2 years ago

Another case that is bugging me with the "ref means array elements by ref" strategy is that I don't see how to generalize it to having an initializer that initializes a domain field to sometimes alias another domain and sometimes be a new domain (which is coming up in the context of Arkouda - more details on that specific situation to come).

This came out of a discussion with Elliot and he has made issue #20164 for it which has a bit more detail.

I'm not understanding this case either, so will look out for the further details. It sounds like a case for ref fields to me, and/or possibly ref(?) fields that can switch between ref and var based on a param value.

I think that in principle, it might be possible to do with ref fields. What worries me about that is that for the domain case, we might have ref x: domain() end up incrementing/decrementing a reference count.

But more broadly what makes me nervous about the "array/domain references are special" proposal is:

  1. Aren't we adding yet another special language feature for array and domain types? Would user-defined types want the same feature?
  2. The strategy complicates the compiler's implementation of memory management. Maybe it is complexity that we just have to have for array/domain, but these things are simple and consistent across all record types when we are talking about ref vs not a ref.
vasslitvinov commented 2 years ago

[After all, I am removing this comment as my reasoning was misguided. Thanks to @mppf for clarifying:

]

@bradcray wrote above in https://github.com/chapel-lang/chapel/issues/12178#issuecomment-1175630292 :

when the user writes:

var A: [1..10] real;
var B: [1..10] real;`
proc pickArray(arg) ref return if arg then A else B;

I'd similarly expect us to return not a ref to the array record but a new descriptor that refers to the _value classes of A or B (?).

This looks questionable to me. Since this is a new descriptor, it has to be returned by value. Then, the following addition to the above code:

var C = pickArray(1);

will make C an alias to A. This is new to Chapel because now two distinct variables contain the same array.

I am open to this behavior. However I suggest making such sharing explicit: https://github.com/chapel-lang/chapel/issues/20164#issuecomment-1205996503

[ADDED] The proposal in this issue's OP already creates such behavior. Indeed, the following code makes C contain essentially the same array as A:

var A: [1..10] real;
proc pickSlice() ref return A[..];
var C = pickSlice();
vasslitvinov commented 1 year ago

@bradcray or others -- we need to resolve this question: in the following code:

var AA: [1..9] int;
ref BB = AA[1..4];

proc sli1() ref {
  return BB;
}

proc sli2() ref {
  return AA[1..3];
}

proc sli3(ref arg1, ref arg2) ref {
  return if someFlag then arg1 else arg2;
}

proc sli4() ref {
  return sli3(BB, AA[1..3]);
}

ref r1 = sli1();
ref r2 = sli2();
ref r4 = sli4();

think which of the call temps for the calls to sli*() need to destruct their ArrayViewSliceArr instance. This is an easy question.

The challenge is: what should sli3 and sli4 do to make it happen?

spoiler alert Which of the call temps for the calls to sli*() need to destruct their ArrayViewSliceArr instance? Clearly not the temp for sli1(), yes for sli2(), yes for sli4() only when someFlag == false. What is the rule that will implement this? Clearly sli1() will return an _array with _unowned=true, sli2() with _unonwed=false, sli4() will depend on someFlag. Then, what should be _unowned for the _array returned by sli3() and can that be translated to _unowned for the return value of sli4() ?
mppf commented 1 year ago

21233 proposes that array views be considered a special kind of reference in the language. That works well with the concept that the ref / const ref return intent would allow returning one without copying anything.

vasslitvinov commented 4 months ago

Related: #25073, returning myArray.domain by const ref.

vasslitvinov commented 4 months ago

What if we allow records to declare a special initializer that says "if a proc is declared to return me by const ref, create this instance and return it by value instead." Then we can use it here and in #25073.

mppf commented 4 months ago

Yes I'd view that as heading towards the custom / user-defined references idea described in #21233.

e-kayrakli commented 4 months ago

I'll just record a case for this where I added a ref intent to the function and it didn't work.

The case comes from the port of llm.c, where the baseline allocates an N+1 memory, defines two pointers at offsets 0 and 1, and finally pretends those two pointers to be two arrays of size N. Essentially we want the middle chunk to be overlapping between the two arrays. Outline of my Chapeltastic way of doing that was (simplified):

record MyRec {
  var Data: [0..#(N+1)] int;

  inline proc Arr1 ref do return Data[0..#N];
  inline proc Arr2 ref do return Data[1..#N].reindex(0..#N);
}

Ideally, Arr1 and Arr2 should ref fields, but this was my attempt at a workaround.

vasslitvinov commented 3 months ago

This was discussed in https://stackoverflow.com/questions/44974903/how-to-return-a-reference-to-a-slice-of-an-array-in-chapel and animated (!) in https://www.youtube.com/watch?v=9fYhva2P0rg

alvaradoo commented 2 months ago

@mppf Recommended me to add a comment to this issue showing that I ran into attempting to return a const ref of a slice, and not being able to do so. A sample of the code I was working with is below, where A is a block-distributed array. The method getSliceConstRef errors out but getSlice works when prepended with the pragma "no copy return". Credit goes to @vasslitvinov for helping me figure this out.

class ArrWrapper {
  var A;

  pragma "no copy return"
  proc getSlice(u,v) {
    return this.A[u..v];
  }

  proc getSliceConstRef(u, v) const ref {
    const ref mySlice = this.A[u..v];
    return mySlice;
  }
}

I am of the mind that since array A exists within the scope of the class ArrWrapper then we should be able to return a slice of it from a class method as a const ref. Michael explained to me the limitations of being able to do this now, but I wanted to share my experience.