What should the semantics be when returning an ArrayView?

benharsh commented 7 years ago

Background

In the process of implementing ArrayViews, we encountered problems when returning an ArrayView (slice, rank-change, or reindex). The following code would either leak or segfault:

proc retOrig(A: []) {
  return A;
}
var X : [1..10] int;
writeln(retOrig(X[1..5]));

We then realized that we had written chpl__autoCopy for ArrayViews incorrectly (they in fact did not perform a deep copy). If a deep-copy is performed, we end up with a non-ArrayView type. This causes issues with resolution in our current compiler. The return-type is inferred to be the same type as A, because at the time we haven't decided to insert an autoCopy yet. In discussing with other team members, we realized that we may want to revisit/finalize semantics for returning an aliasing array.

Should ArrayViews copy-out?

One possible approach to this problem is that ArrayViews should always be copied when returned. This would allow us to more easily determine the return type of the retOrig example above, and is a consistent rule for users to remember. When considering this option, also consider the following code:

config var test = false;

var glob : [1..10] int = 1..10;

proc ret() {
  if test {
    return glob[1..5];
  } else {
    return glob;
  }
}

writeln(ret());

My gut reaction is that this should be a valid Chapel program. If we did not copy-out here, then there would be a type mismatch due to the ArrayView implementation.

Should we allow arrays/ArrayViews to be returned by `ref` ?

ArrayViews development progressed very far without running into issues related to returning an ArrayView, suggesting that this isn't a commonly used pattern in our testing suite. That said, could/should we support arrays and ArrayViews to be returned if the ref return-intent is used?

If one modified the function ret in the previous code sample to have the ref return-intent, I believe that the program would no longer be valid. In such a case, I think we should have a better error message than just complaining about a type mismatch.

mppf commented 7 years ago

That said, could/should we support arrays and ArrayViews to be returned if the ref return-intent is used?

We already have code that returns arrays by ref or const ref and that works OK. But returning a local slice expression (such as glob[1..5]) by ref or const ref seems wrong to me and requires a fair amount of acrobatics in the compiler to support (see fixupNewAlias for some of it).

I'm tentatively in favor of the copy-out approach generally for ArrayViews.

bradcray commented 7 years ago

In the case of default return intents, using copy-out for array views (slices, rank change, reindexing) seems appropriate and (I think?) consistent with what we do today. I'm curious and/or concerned about the level of effort required to make this happen in the compiler based on Ben's note about the phase ordering between inferring return types and inserting copies for array returns.

In the case of 'ref' return intents, I could potentially imagine returning the array view itself by ref if that works and makes sense. But I don't feel confident that I'm anticipating cases where it would cause problems.

mppf commented 7 years ago

Re level of effort for returning slices by value - I think it would be less effort to make this change than it would be to try to preserve the current behavior with new array views strategy. I think the necessary changes can mostly follow along with changes I made for returning array elements in tuples by value.

vasslitvinov commented 7 years ago

Given that an array view is a kind of array, it makes sense to treat it the same way:

"copy out" as the default return intent
"ref-if-modified" as the default argument intent
if it is created locally, can't return it by ref

vasslitvinov commented 7 years ago

Curiosity question: the situation like var a = b; or when a return a; is assigned into b -- resulting in a and b having different types -- are we thinking to make it specific to arrays or allow it with user-defined types e.g. records as well?

If the latter, the mechanism is a proc R.init(other: R) that generates a different instance, or are there other cases?

mppf commented 7 years ago

Just as a straw-man proposal, we could have something like the following to distinguish returning a copy of a slice and returning a slice (that continues to alias whatever it aliased):

var A:[1..10] int;
proc returnsCopy() {
  return A[1..4];
}
proc returnsSlice() {
  return => A[1..4];
}

I think some of us have been arguing that proc returnsSlice() ref should be an option there but I don't think I can get behind that proposal. I think it should be separate from ref return intent.

vasslitvinov commented 7 years ago

To answer my curiosity question...

The initializer team addressed it recently by postponing it until we have a compelling use case. Meanwhile the compiler behavior remains unspecified. It is out intention that if the program compiles without errors, it should execute and exhibit reasonable behavior at run time in this particular respect.

mppf commented 7 years ago

Here is slightly more information about why I don't want to use ref return intent on a function like returnSlice. When you indicate ref return intent, it is an error to return a local variable - and that includes compiler temporaries. So for example, the following program would be in error:

record R { var x: int; }
proc makeR(arg:int) { return new R(arg); }
proc erroneous() ref {
  return makeR(1);
}

Why should a slice of an array be any different? After all, constructing a slice is some sort of function call. So, such support would amount to adding some special-case stuff to the compiler - basically, the function would return the slice _array by value but without making the usual deep copy. I'm not really pleased with the idea of changing some ref return intent functions, after type resolution, to no longer be ref return.

I don't think we should support returning a local array variable by ref but arguably that would require the same compiler shenanigans as returning a slice by ref. Consider the following:

proc returnsLocalArrayByRef() ref {
  var A:[1..10] int;
  return A;
}

I don't think that should compile. But, enabling slice expressions to return by ref would amount to making a similar thing work (from the compiler's viewpoint, anyway).

vasslitvinov commented 7 years ago

So, such a mechanism for returning a local thing by ref - which you are proposing for a slice, however it can be applied to an array or, really, anything - amounts to passing ownership of the local thing to the caller. So, for example, the caller needs to deallocate the referencee when done.

This sounds like a new feature we can add to Chapel. I like it.

It can be implemented like this: Such a ref-returning function also generates a flag, either compile-time or run-time, indicating whether the ownership is being transferred. The caller then examines this flag and deallocates as needed - when the reference goes out of scope.

mppf commented 7 years ago

I don't think I'm proposing anything that requires any action at all when deallocating a ref variable. I think one of the key design points of ref variables and ref returns is that they have no impact on the lifetime of the value in question. I don't think we should change that for array slices.

vasslitvinov commented 7 years ago

Discussing this off-list with Michael, we clarified that the key point of his proposal:

proc returnsSlice() {
  return => A[1..4];
}

is to allow returning the slice itself. Cf. by default a new array - which is initialized from the slice - would be returned:

proc returnsArray() {
  return A[1..4];
}

In both cases, the return is done "by value", i.e. the caller gets ownership of the thing being returned.

What is a good syntax to differentiate returning the slice itself from returning an array derived from the slice? E.g. Michael is proposing return => sliceExpression;

benharsh commented 7 years ago

I've been working on this change, and for the most part things are going smoothly. I have run into a couple of tests under studies/amr that want to implement their own kind of slicing functions via "proc this". Unless I add a magic pragma, they will copy-out and the tests will be incorrect. I encountered a similar issue with "localSlice" in the various distributions, but I'm not as concerned with a pragma there.

benharsh commented 7 years ago

Regarding a sort of "alias-return" syntax, I'd prefer if it went in the usual return-intent slot. I don't like "=>" for this purpose though.

It's also not just 'alias-return', it's: do not copy on the way out and do not destroy yet.

vasslitvinov commented 7 years ago

To distinguish between returning a new array created from the returned slice ex. return A[1..3]; and returning the slice itself - that forces us to add another feature to Chapel, perhaps with new syntax.

My proposal is to simplify and always return the slice itself.

I believe this preserves the benefits of the current approach of converting a slice to an array in the callee. For example:

proc returnsSlice() return A[1..3];

var B = A[1..3];
var C = returnsSlice();

The semantics in both cases above is the same: RHS is a slice and LHS is a new array. This is easy to explain to the user.

Ben H pointed out a couple of cases that are less rosy, yet can be handled in a reasonable manner.

// (A) should the return type be an array-view array or a regular array?

if .... then return A[1..3];
        else return B;

// (B) how to return a slice of a local array?

var C: [1..5] real;
return C[1..3];

(A): To do this, allow implicit conversion from a slice to an array. Just like we can write if ... then return 5 else return 5.5;, the user can write (A) and the slice will be coerced to a new array along the true branch. If the else branch is param-folded away, then the slice would be returned.

(B) We can have the "can return a local array" feature to take precedence over the "a slice is returned as-is" feature. That is, the callee converts the slice to a new array and returns it, as if returning a local array.

bradcray commented 7 years ago

Returning belatedly to this conversation...

I don't think we should support returning a local array variable by ref but arguably that would require the same compiler shenanigans as returning a slice by ref. Consider the following:
proc returnsLocalArrayByRef() ref {
  var A:[1..10] int;
  return A;
}
I don't think that should compile. But, enabling slice expressions to return by ref would amount to making a similar thing work (from the compiler's viewpoint, anyway).

I still prefer using 'ref' for the purpose of returning a slice by reference. In saying that, I want to clarify that I don't think the following case should be legal:

proc returnLocalSliceByRef() ref {
  var A: [1..10] real;
  return A[1..3];
}

(for the obvious reason that A won't exist after the function returns, so returning an alias to it is a bad idea). But I do think the following should be legal:

var A: [1..10] real;
proc returnNonlocalSliceByRef() ref {
  return A[1..3];
}

Why do I think this?

In other recent array semantics conversations, we've talked about intents on arrays being w.r.t. the array elements themselves, not the descriptors used to implement the arrays. So while the array descriptor used to represent the slice is technically local to the routine, the user isn't particularly aware of / concerned with that descriptor so much as the array elements; so they shouldn't be burdened by that implementation detail. Put the opposite way, I think it's reasonable to put in the effort to have the compiler copy the descriptor out as necessary to make this work.
Because I think this is an important idiom to support and the return intent field is otherwise unused in this case. I.e., if we express this idiom, we're going to have to put in some effort to support it in any case. So why put in that effort to introduce and implement a new/different syntax rather than using an existing syntax that seems to make sense in this scenario and is otherwise unused?

Note that there are tests that were edited to use a 'pragma "no copy return"' to work around this issue as part of the array views work, which is OK as a short-term fix, but seems like a lame long-term solution. Specifically, I'd be happiest if we were able to remove those pragmas and implement the codes in the way we'd want users to in time for the 1.15 release...

The tests in question are:

test/studies/amr/... due to:
- test/studies/amr/lib/grid/GridVariable_def.chpl
- test/studies/amr/lib/level/LevelVariable_def.chpl
test/studies/hpcc/HPL/vass/hpl.hpcc2012.chpl
test/optimizations/bulkcomm/asenjo/ptransDR/v1/ptrans.chpl

So I wanted to see whether it would be possible to persuade Michael (and Vass?) over to this perspective, and if not, to clarify what the objections are. Thanks!

mppf commented 7 years ago

@bradcray - I think I could be convinced by the "ref means elements by ref" argument for

var A: [1..10] real;
proc returnNonlocalSliceByRef() ref {
  return A[1..3];
}

but last time I implemented something along these lines, I wasn't so confident that simple syntactic variations wouldn't just introduce bugs. In other words, the compiler would do fragile pattern-matching to make it work.

How can we make the above example work but for the compiler to give an error for this:

proc returnLocalSliceByRef() ref {
  var A: [1..10] real;
  return A[1..3];
}

And couldn't there be arbitrary function calls and refs involved? This one should work:

proc identity(ref array) ref return array;
var A: [1..10] real;
proc harderOK() ref {
   ref x = identity(A[1..3];
   return x;
}

But this one ought to be an error:

proc identity(ref array) ref return array;
proc harderError() ref {
   var A: [1..10] real;
   ref x = identity(A[1..3];
   return x;
}

bradcray commented 7 years ago

I think your concerns are valid, but I'm curious whether you think they're specific to the "use keyword 'ref' to return slices by reference" approach? I.e., wouldn't the previous proposals involving '=>' have the similar challenges?

While it's clearly desirable to generate errors for cases like this as often as possible, do you think it's strictly required that the compiler do so in all cases? Does your concern not have equivalent cases where we return refs in other contexts? I guess I'm wondering whether we can consider it a user error like other illegal cases that we can't [easily / possibly] check for at compile-time. (i.e., is this equivalent to the halting problem?)

bradcray commented 7 years ago

Seeing if time has brought us any closer to resolving this issue given that it remains one of the arguable holes in the language and that @npadmana has run afoul of it in: this StackTrace question. I continue to think that a ref return intent is an intuitive way to mark such procedures and am still curious for answers from @mppf to the questions I asked just above.

mppf commented 7 years ago

@bradcray - most of my hesitation has to do with the potential for confusion in the implementation. I might prefer to get the viewpoint of somebody else who hasn't been neck-deep in this area of compiler in order to evaluate what would be usable and not too surprising.

I personally think the ref return intent in this context is confusing - but that's because I know it's returning a temporary by value... and I'm willing to believe that a user might actually find it more intuitive than some other syntax. Some other new syntax wouldn't be confusing to me in because it wouldn't conflict with the idea of what ref already does.

I'm somewhat convinced by ref means elements by ref, from a strictly philosophical point of view.

Maybe we could just brainstorm some syntaxes and present them to a couple of people to give us a two-cent usability test on them.

I also think that it's interesting that the tail end of this conversation actually brought up borrow-checking-like issues before we had started using that term for it. Seems like at least the compiler checks element here is related.

bradcray commented 7 years ago

I'm somewhat convinced by ref means elements by ref, from a strictly philosophical point of view.

This is the philosophy we shot for when we started designing the language and it seems to me that it's also been used in recent array semantic discussions so it would seem a bit odd to do otherwise here.

I might prefer to get the viewpoint of somebody else who hasn't been neck-deep in this area of compiler in order to evaluate what would be usable and not too surprising.

Does the fact that @npadmana naturally went here on his StackOverflow question count as such a viewpoint? (I swear I didn't put him up to it -- intentionally, anyway).

Also interesting, in a follow-up comment he noted:

it appears that I can return a ref to a slice in an iterator (using a ref return intent). That's an interesting difference with return from a proc...

If we do use ref for this, it seems very orthogonal to returning other array expressions by ref to me (which might simply be the case of "returning arrays"?) If we did not use ref for this, what would ref mean in such cases?

I'm also curious how @benharsh and @vasslitvinov are feeling about this given that they weighed in on the thread earlier. And @ben-albrecht might be another useful opinion from a user point-of-view (does the linear algebra library provide any insights?)

benharsh commented 7 years ago

Returning a slice by ref intent feels natural/intuitive to me.

I think we'd like to support a similar notion for user records. Perhaps the ref return-intent for arrays would be sugar for this other functionality?

mppf commented 5 years ago

I just wanted to point out that this question is a little bit related to "What should happen when a promoted expression is returned?" #7994.

mppf commented 5 years ago

Closing this issue because I think the main open question is now more directly represented in #12178.

vasslitvinov commented 4 months ago

This issue was discussed in https://stackoverflow.com/questions/44974903/how-to-return-a-reference-to-a-slice-of-an-array-in-chapel and animated (!) in https://www.youtube.com/watch?v=9fYhva2P0rg

chapel-lang / chapel