Open mppf opened 5 years ago
Tests to update if this is addressed somehow:
Grepping for pragma "no copy return"
is another way to find tests that run into this issue (I believe by doing so I found some not in Michael's list above). I think this issue could likely be considered for general returns of array expressions even without slicing (though slicing is the most common case where this seems to come up in practice). For example:
var B1, B2: [1..100] real;
proc getNextBuffer(it: int) ref {
if it % 2 == 0 then
return B1;
else
return B2;
}
(of course, there are other ways to write this pattern without returning an array by ref
, but it seems as though someone ought to be able to write patterns like this).
@bradcray - I believe the last example you wrote works today (once you rename the iter
variable).
Another thing to think about, besides slices, is if users should be able to opt-in to returning an iterator record / iterator expression rather than turning it into an array.
Oh, OK. I was thinking we didn't have a way to return an array by ref today short of using a pragma. I must've been misremembering (I've confirmed that it works).
should [users] be able to opt-in to returning an iterator record / iterator expression rather than turning it into an array.
I definitely think so. By mentioning it on this issue, are you suggesting that you think ref
might be a reasonable way to do so?
By mentioning it on this issue, are you suggesting that you think ref might be a reasonable way to do so?
That's unclear to me, but I think it's worth considering at least.
One tricky case with this proposal is when returning a pre-existing slice by ref
, for example:
var A: [1..10] real;
ref slice1 = A[1..3];
ref slice2 = A[2..4];
proc pickSlice(arg) ref return if arg then slice1 else slice2;
Should pickSlice() return a shallow copy of the chosen slice or a ref to it? The considerations are:
ref
slice-returning functions return a ref and other return a slice value may be confusing for the user and complicate the implementation.ref
slice-returning functions return a shallow copy by value, some programs will run slower. For example, a program where the above pickSlice()
is invoked in a tight loop will be creating lots of slices instead of just manipulating pointers. This situation is unlikely, however should be considered.Looking through the discussion in #5341, I like the motivation that ref
intent -- essentially this proposal -- is what users resort to intuitively (Nikhil, Tom).
I disagree with the reasoning that the return intent is what we want applied to array elements. This may or may not be intuitive, however this is inconsistent with what we do for the default argument intent, for example for an array of reals.
Another case that is bugging me with the "ref means array elements by ref" strategy is that I don't see how to generalize it to having an initializer that initializes a domain field to sometimes alias another domain and sometimes be a new domain (which is coming up in the context of Arkouda - more details on that specific situation to come).
Should pickSlice() return a shallow copy of the chosen slice or a ref to it?
I would say a shallow copy of the chosen slice. For comparison's sake, when the user writes:
var A: [1..10] real;
var B: [1..10] real;
proc pickArray(arg) ref return if arg then A else B;
I'd similarly expect us to return not a ref
to the array record but a new descriptor that refers to the _value classes of A or B (?).
will be creating lots of slices instead of just manipulating pointers. This situation is unlikely, however should be considered.
I expect it not to be creating a lot of slices so much as creating records with class fields that alias existing classes. So small struct manipulation rather than the equivalent of running A[2..9]
, say.
This may or may not be intuitive, however this is inconsistent with what we do for the default argument intent, for example for an array of reals.
Can you say more about what feels inconsistent to you? I feel as though we've traditionally described an array's value as being "the values of all of its elements", such that passing an array by ref
means that you're aliasing all of the original elements. And I think this is what the default argument intent does from a user's perspective, regardless of how it's implemented. But you may be seeing something that I'm missing.
that I don't see how to generalize it to having an initializer that initializes a domain field to sometimes alias another domain and sometimes be a new domain (which is coming up in the context of Arkouda - more details on that specific situation to come).
I'm not understanding this case either, so will look out for the further details. It sounds like a case for ref
fields to me, and/or possibly ref(?)
fields that can switch between ref
and var
based on a param value.
Brad - good points. In your example:
var A: [1..10] real;
var B: [1..10] real;
proc pickArray(arg) ref return if arg then A else B;
today pickArray()
returns a straight address of the _array record of either A or B. This perhaps reveals our bias towards viewing an array as an object, rather than just a collection of elements. (Or maybe it doesn't.)
a program where the above pickSlice() is invoked in a tight loop will be creating lots of slices instead of just manipulating pointers
I agree pickSlice() would not calculate a slice from scratch the way the expression A[2..9]
would. However, it would create an array (with an ArrayView domain map). I expect that array creation will increment the array counter in the respective domain. This is what makes me concerned about performance impact.
I disagree with the reasoning that the return intent is what we want applied to array elements. This may or may not be intuitive, however this is inconsistent with what we do for the default argument intent, for example for an array of reals.
When passing an array of reals to a function by the default intent, the rule "the argument intent is applied to array elements" would entail passing each element by const in
, i.e., copying the array. The current rule instead passes the array by reference. To be fair, I do not consider the reasoning Brad states and the inconsistency I see to be major players in choosing the solution.
Making my thinking explicit, I am generalizing this issue to defining new syntax that, together with existing features such as existing value and ref return intents, will enable users to choose among the following semantics:
Furthermore, users will be able to make these choices:
Ideally both "by value" semantics will be available simultaneously, that is in different branches of a single conditional or in different return statements of a single function. I doubt the virtue of mixing by-value and by-reference semantics in these "simultaneous" situations because I do not see how the recipient of such a mix, ex. the caller of such a function, can handle it.
I expect that array creation will increment the array counter in the respective domain. This is what makes me concerned about performance impact.
For your pickArray()
case, that could be optimized away by seeing that the arrays are at module scope and don't need to be reference counted...
When passing an array of reals to a function by the default intent, the rule "the argument intent is applied to array elements" would entail passing each element by const in, i.e., copying the array. The current rule instead passes the array by reference.
I'm still not following... what rule are you citing here?
return an array slice as ... a new slice by value, or as an existing slice by ref return a domain ... as a domain (also by value) that aliases another domain, or as a reference to an existing domain
How would a user distinguish between these cases? What would motivate them to do so?
To me, this general direction seems like it adds a lot of complexity that hasn't been requested and doesn't seem clearly motivated...
The difference between returning a slice or a domain by value vs. by reference lies in what the user can do with them. If the user has a reference, they need to worry about its lifetime. If the user has a value, they can store it in a field, return it out by value, ...
Somehow I feel uneasy about taking away the ability for the user to return a slice (or domain) by reference, as this proposal suggests.
Another case that is bugging me with the "ref means array elements by ref" strategy is that I don't see how to generalize it to having an initializer that initializes a domain field to sometimes alias another domain and sometimes be a new domain (which is coming up in the context of Arkouda - more details on that specific situation to come).
This came out of a discussion with Elliot and he has made issue #20164 for it which has a bit more detail.
I'm not understanding this case either, so will look out for the further details. It sounds like a case for
ref
fields to me, and/or possiblyref(?)
fields that can switch betweenref
andvar
based on a param value.
I think that in principle, it might be possible to do with ref
fields. What worries me about that is that for the domain case, we might have ref x: domain()
end up incrementing/decrementing a reference count.
But more broadly what makes me nervous about the "array/domain references are special" proposal is:
ref
vs not a ref
.[After all, I am removing this comment as my reasoning was misguided. Thanks to @mppf for clarifying:
var C = pickArray(1);
below will insert a copy of the rhs during initialization because the rhs does not "own" the array it is returning, even though the return is de-facto by valuevar C = pickSlice();
below will insert a copy of the rhs because this is what happens whenever initializing from a slice, even upon var C = A[..];
ref
, not in a var
: ref C = pickSlice();
- in which case there is no duality problem]
@bradcray wrote above in https://github.com/chapel-lang/chapel/issues/12178#issuecomment-1175630292 :
when the user writes:
var A: [1..10] real; var B: [1..10] real;` proc pickArray(arg) ref return if arg then A else B;
I'd similarly expect us to return not a ref to the array record but a new descriptor that refers to the _value classes of A or B (?).
This looks questionable to me. Since this is a new descriptor, it has to be returned by value. Then, the following addition to the above code:
var C = pickArray(1);
will make C an alias to A. This is new to Chapel because now two distinct variables contain the same array.
I am open to this behavior. However I suggest making such sharing explicit: https://github.com/chapel-lang/chapel/issues/20164#issuecomment-1205996503
[ADDED] The proposal in this issue's OP already creates such behavior. Indeed, the following code makes C contain essentially the same array as A:
var A: [1..10] real;
proc pickSlice() ref return A[..];
var C = pickSlice();
@bradcray or others -- we need to resolve this question: in the following code:
var AA: [1..9] int;
ref BB = AA[1..4];
proc sli1() ref {
return BB;
}
proc sli2() ref {
return AA[1..3];
}
proc sli3(ref arg1, ref arg2) ref {
return if someFlag then arg1 else arg2;
}
proc sli4() ref {
return sli3(BB, AA[1..3]);
}
ref r1 = sli1();
ref r2 = sli2();
ref r4 = sli4();
think which of the call temps for the calls to sli*() need to destruct their ArrayViewSliceArr instance. This is an easy question.
The challenge is: what should sli3 and sli4 do to make it happen?
ref
/ const ref
return intent would allow returning one without copying anything.Related: #25073, returning myArray.domain
by const ref
.
What if we allow records to declare a special initializer that says "if a proc is declared to return me by const ref
, create this instance and return it by value instead." Then we can use it here and in #25073.
Yes I'd view that as heading towards the custom / user-defined references idea described in #21233.
I'll just record a case for this where I added a ref
intent to the function and it didn't work.
The case comes from the port of llm.c, where the baseline allocates an N+1 memory, defines two pointers at offsets 0 and 1, and finally pretends those two pointers to be two arrays of size N. Essentially we want the middle chunk to be overlapping between the two arrays. Outline of my Chapeltastic way of doing that was (simplified):
record MyRec {
var Data: [0..#(N+1)] int;
inline proc Arr1 ref do return Data[0..#N];
inline proc Arr2 ref do return Data[1..#N].reindex(0..#N);
}
Ideally, Arr1 and Arr2 should ref
fields, but this was my attempt at a workaround.
This was discussed in https://stackoverflow.com/questions/44974903/how-to-return-a-reference-to-a-slice-of-an-array-in-chapel and animated (!) in https://www.youtube.com/watch?v=9fYhva2P0rg
@mppf Recommended me to add a comment to this issue showing that I ran into attempting to return a const ref
of a slice, and not being able to do so. A sample of the code I was working with is below, where A
is a block-distributed array. The method getSliceConstRef
errors out but getSlice
works when prepended with the pragma "no copy return"
. Credit goes to @vasslitvinov for helping me figure this out.
class ArrWrapper {
var A;
pragma "no copy return"
proc getSlice(u,v) {
return this.A[u..v];
}
proc getSliceConstRef(u, v) const ref {
const ref mySlice = this.A[u..v];
return mySlice;
}
}
I am of the mind that since array A
exists within the scope of the class ArrWrapper
then we should be able to return a slice of it from a class method as a const ref
. Michael explained to me the limitations of being able to do this now, but I wanted to share my experience.
This issue is pulled out of "What should the semantics be when returning an ArrayView?" #5341. (There is a lot of discussion and motivation on that issue - this issue exists to summarize the current proposal succinctly).
Currently, this program
returns a copy of the array slice (and so returns a non-view DefaultRectangular array). The question is - what could a user do if they didn't want that copy?
The proposal here is that
ref
return intent for an array slice means "return array elements by ref".Specifically, this program would be in error:
since it conceptually returns an array with elements that have already been destroyed.
The following program would be legal and would return an array slice referring to the same elements as the global variable A: