chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

Reshaping arrays of arrays into N-D arrays #9693

Open ben-albrecht opened 6 years ago

ben-albrecht commented 6 years ago

Given the task of flattening an array of arrays into a 1D array, this is the cleanest solution we could come up with:

var nestedArr: [1..3] [1..2] int;
var flattenedArr = flat(nestedArr);

  iter flat(arr) {
    for i in arr {
      if isArray(i) {
        for j in flat(i) {
          yield j;
        }
      } else {
        yield i;
      }
    }
  }

// Credit: @daviditen

It would be nice if the standard library / built-in functions supported a better way to accomplish this, and any other conversion of array of arrays to N-D arrays.

One appealing solution to this is to extend the reshape function to work with arrays of arrays. This could work by providing a reshape overload that takes a type argument to specify what the element type of the reshaped array should be, leaving the existing reshape usage unchanged.

Here are some examples of what this might look like:

// Converting an array of arrays to a 1D array
var A: [1..3] [1..4] int;
var B = reshape(A, {1..12}, int);

// Converting an array of arrays to a 2D array
var A: [1..3] [1..4] int;
var B = reshape(A, {1..3, 1..4}, int);

// Converting a 2D array to an array of arrays
var A: [1..3, 1..4] int;
var subArr: [1..4] int;
var B = reshape(A, {1..3}, subArr.type); // Maybe there's a better way to express this...

// Converting a 1D array to an array of arrays
var A: [1..12] int;
var subArr: [1..4] int;
var B = reshape(A, {1..3}, subArr.type); // Maybe there's a better way to express this...
cassella commented 6 years ago

Could there be some sort of ArrayViewReshape along the lines of ArrayViewRankChange and 'ArrayViewSlice`?

bradcray commented 6 years ago

Keep in mind that an array of arrays is not contiguous in memory whereas any given array (typically) is. For that reason, I'd be disinclined to have an operator that provides a view on an array of one type that makes it appear to be an array of a completely different type and memory layout; and for that reason would prefer to have this operator, if it existed, return a new array as in Ben's examples rather than an alias to an existing array as Paul proposes (I'm also reluctant to go back down the array view path many more times if it can be avoided).

I'm not sure how I feel about the proposed reshape routine at present (but am thinking about it). It does seem like there's definite value in having a parallel "linearize" iterator for an array similar to Ben's flat iterator (but parallel) where copies could be made between any arrays with the same arity using zippering:

forall (a, b) in zip(linearize(A), linearize(B)) do
  a = b;

Or maybe this could even be written:

linearize(A) = linearize(B);

Then the question is whether we'd want to provide the convenience function of a reshape that would wrap this capability or just teach people that this is the way to convert between arrays of different shape. There's something that feels a bit clunky about the reshape approach to me, I think relating to the fact that a bunch of stuff defining B's size/shape/domain/distribution is given as arguments to reshape rather than simply being used to define B...

ben-albrecht commented 6 years ago

It does seem like there's definite value in having a parallel "linearize" iterator for an array similar to Ben's flat iterator (but parallel) where copies could be made between any arrays with the same arity using zippering

Agreed - seems like a no-brainer to provide this. Where would this routine live? Maybe Array Operations? Misc Functions?

There's something that feels a bit clunky about the reshape approach to me

While I see your point, I think it's a reasonable trade-off. I think of this as a composite design pattern, where we are providing an intuitive extension to an existing tool with which users are already familiar, reducing the mental barriers to learning the interface.

I think relating to the fact that a bunch of stuff defining B's size/shape/domain/distribution is given as arguments to reshape rather than simply being used to define B...

To be clear, the reshape interface remains the same, with one additional optional argument (eltType). In the case of using non-DefaultRectangular distributions, which is what I think you were talking about, there is definitely redundancy in specifying the domain/element type, but I think that's an acceptable trade-off. Here's what the might look like:

use BlockDist;

var D = {1..12};
var blockD: domain(1) dmapped Block(boundingBox=D) = D;

var A: [1..3] [1..4] int;
var B: [blockD] int = reshape(A, D, int);