chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Allow rank change in reindex #12222

Open ben-albrecht opened 5 years ago

ben-albrecht commented 5 years ago

I'm looking for functionality in Chapel similar to numpy.ravel, where a multi-dimensional array is flattened into a 1D array, which still references the elements of the original multi-dimensional array.

It seems like this functionality should fall out of reindex, but reindex currently restricts the new index set to maintain the same rank as the original domain. If this restriction were lifted, it might look something like this:

var D = {1..2, 1..2};
var A: [D] int;

ref reA = A.reindex(1..4);

reA[1] = 1;
A[2,2] = 2;

writeln(A);
/*
  1 0
  0 2
*/
writeln(reA);
/*
  1 0 0 2
*/

This would be nicer to work with if reindex supported unbounded ranges (https://github.com/chapel-lang/chapel/issues/7715) :

var D = {1..2, 1..2};
var A: [D] int;

ref reA = A.reindex(1..);

There is an open question about how to choose the order of the transformation, i.e. row-major vs. column-major. numpy handles this with an optional arg, which could work for us as an enum argument:

numpy.ravel(a, order='C')

order : {‘C’,’F’, ‘A’, ‘K’}, optional The elements of a are read using this index order. ‘C’ means to index the elements in row-major, C-style order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to index the elements in column-major, Fortran-style order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of axis indexing. ‘A’ means to read the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise. ‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.

We could alternatively encode the interpretation of order into the domain map. However, supporting an argument in the transformation seems more versatile.

Somewhat related: #9693

ben-albrecht commented 5 years ago

This came up in the development of #12027.

bradcray commented 5 years ago

This sounds more like a(n aliasing) reshape than a reindex to me...? Or, if it was always going to return a 1D array (which it looks like ravel does?) then I might just call it "flatten" or something like that?

bradcray commented 5 years ago

Also, I'm noting that ravel does something about the order in which things are linearized, but if you're assuming the same array memory is being used, it seems like we wouldn't need or want to do that (i.e., if you cared about the linearization, you'd specify that in the multidimensional array's layout not in the flattening since anything other than what it was would require moving the data around?). Does that seem right?

ben-albrecht commented 5 years ago

This sounds more like a(n aliasing) reshape than a reindex to me...?

It's not clear to me if a reshape-by-reference is a better fit for this than a reshaped-reindex. I don't have a strong preference between the two - I just considered the latter first.

Or, if it was always going to return a 1D array (which it looks like ravel does?) then I might just call it "flatten" or something like that?

We don't necessarily have to limit ourselves to the 1D case like ravel does. I bet there's some use-case where mapping the indices of a 1D array to an ND array is needed.

I personally think it'd be fine for this functionality to be contained with reindex or reshape rather than introducing a new helper routine.

While we're using numpy as a design model, I'll mention that there is also a numpy.flatten which is like ravel, but always returns a copy of the array - something we can already do with reshape() today.

ben-albrecht commented 5 years ago

if you cared about the linearization, you'd specify that in the multidimensional array's layout not in the flattening since anything other than what it was would require moving the data around?

Right. That's what I meant by this statement:

We could alternatively encode the interpretation of order into the domain map.

.. which seems reasonable to me.

As an addition to this, we might consider providing an argument (like python does) to override the default order (which would be determined from the array layout), e.g. a users wants to linearize as column-major order, but the array is row-major order in memory.

bradcray commented 5 years ago

we might consider providing an argument (like python does) to override the default order (which would be determined from the array layout), e.g. a users wants to linearize as column-major order, but the array is row-major order in memory.

Just to make sure I'm understanding: The impact of this would be to shuffle the elements in the original multidimensional array so that we could continue using the same memory (i.e., A[i,j] would become A[j,i] after this call was made for the 2D array in which we changed between CMO and RMO) as opposed to creating an array copy in this case?

ben-albrecht commented 5 years ago

Just to make sure I'm understanding: The impact of this would be to shuffle the elements in the original multidimensional array so that we could continue using the same memory (i.e., A[i,j] would become A[j,i] after this call was made for the 2D array in which we changed between CMO and RMO) as opposed to creating an array copy in this case?

I was thinking that the original array's memory would remain untouched, and that the reindex gives a "linearized as if CMO" or "linearized as if RMO" view of the array, depending on the argument. Here's an example:

// Pretend that Chapel supports 2d literals for a moment
var A = [[1,2,3],
         [4,5,6]];

// A is RMO
writeln(A[1,2]);
// 2

ref rowA = A.reindex(1..); // defaults to order=Order.Row, due to A's layout being RMO
ref colA = A.reindex(1.., order=Order.Column);

// A is still RMO
writeln(A[1,2]);
// 2

writeln(rowA);
// [1 2 3 4 5 6]

writeln(colA);
// [1 4 2 5 3 6]
bradcray commented 5 years ago

Oh, I see... So the expectation is that when indexing into the 1D view of the array, the descriptor resulting from this operation would do math on the index to transform it into a multidimensional index for the original array. I'd been imagining that we would simply create a default rectangular array that aliased the multidimensional array's element buffer and use normal/simple 1D indexing to access them.

Given that clarification, I agree with your original intuition that this is more like reindexing than it is reshaping (since reindexing also does transformations on indices to turn new ones into original ones).

[Unfortunately for you, this type of reindexing also makes me want to run away screaming due to PTSD... :) ]

ty1027 commented 5 years ago

I bet there's some use-case where mapping the indices of a 1D array to an ND array is needed.

FWIW, in my use cases, I often need to express a chunk of memory (allocated by an array in a user-defined type) as 1D or 2D arrays in different routines, such that:

-- In 1D case, I want to access that array as x( 1 .. Ndof ) (where Ndof = number of degrees of freedom) in a general way, so that this is not a problem/system-specific (i.e., a general N-dimensional problem);

-- At the same time, I want access the same chunk of memory as coord( 1 .. 3, 1.. Natoms ), where Natoms is the number of atoms (or particles) to compute the potential energy conveniently in some routine (e.g., using a pairwise-additive force field).

This is straightforward in Fortran because there is a rank-remapping array pointer (which can create an array view with a different rank from the array pointed). I usually wish to distribute a higher level of objects (e.g., an entire "molecular system" that contains many molecules) rather than the above array itself (e.g., which represents a single molecule inside the entire system), so the above array is intended to be local.

Just as one example case...