chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

when should an array be resized when it is read? #5053

Open mppf opened 7 years ago

mppf commented 7 years ago

I've found it useful to allow some Chapel arrays to be read without knowing their size in advance. In particular, non-strided 1-D Chapel arrays that have sole ownership over their domain could be read into where that operation will resize the array to match the data read. At one point, I prototyped this for JSON and Chapel style textual array formats (e.g. [1,2,3,4] ).

Here is an example of the code I'm interested in supporting:

var A:[1..0] int;
mychannel.read(A);

could read into A any number of elements and adjust its domain accordingly. The alternative is that the code above only reads 0-long arrays.

This case is particularly relevant because one might imagine a record that stores a variable-length array. Can the default readThis operation provided by the compiler do something reasonable?

record ContainingArray {
   var A: [1..0] int;
}
var r:ContainingArray;
mychannel.read(r);

Or, is it necessary for authors of such records to implement a custom readWriteThis method if they wanted I/O to work in a reasonable manner?

There are three key questions:

1) Does changing the size of a read array when possible seem like the right idea? Or should reading an array always insist that the input has the same size as the existing array (which I believe is behavior that matches the rest of the language for arrays that share domains...)

2) Should any-dimensional rectangular arrays be written in binary in a form that encodes the size of each dimension? (In other words, write the domain first?). Such a feature would make something like (1) possible for multi-dimensional arrays but might not match what people expect for binary array formats. (I don't think we've documented what you actually get when writing an array in binary yet...)

3) Any suggestions for a Chapel array literal format for multi-dimensional arrays? How would you write such arrays in JSON (and would anyone want to)? At one point there was a proposal to put the domain in array literals, like this:

var A = [ over {1..10} ];

but that doesn't really answer how to write multidimensional array literals. One approach would be to store the array elements in a flat way and just reshape them while reading; e.g.

var A = [ over {1..2, 1..3}
          11, 12, 13,
          21, 22, 23 ];

where the spacing would not be significant.

If we had a reasonable format, we could extend support like (1) to any-dimensional arrays that do not share domains, even for some textual formats.

mppf commented 7 years ago

For question 1 - should we ever resize arrays on reading, one potential answer is to do such resizing when both of these conditions are met:

  1. the array's domain is not shared
  2. the I/O is using a format compatible with reading-in generally, such as %jr or %jr
  3. the array's current size is 0

Some objections to this strategy include:

One possible alternative would be for user making I/O calls to indicate if arrays should resize on read or not. This strategy has the advantage that one could separately pass the domain (which seems more natural since the domain will be set by the reading operation).

Another possible strategy is to rely on an initializer to do it - e.g. "resizing" an array is possible when initializing an array from a file/channel.

mppf commented 6 years ago

This question is related to an earlier discussion:

https://sourceforge.net/p/chapel/mailman/message/34335394/

benharsh commented 6 years ago

I agree with the condition that such arrays must not share their domain. I also agree that the array's domain should have zero indices. I recommend that array views be excluded from this functionality (perhaps that's simply a matter of not implementing the resizing read/write helpers?).

As long as we continue to support push/pop-methods on arrays, I think we should allow resizes during IO.

The over {1..2, 1..3} formatting doesn't really appeal to me. Off the top of my head, I'd prefer something like:

[[11,12,13], [21,22,23]]

Our IO readers/writers can interpret this format based on their type. An array-of-arrays can read this naturally, or a multidimensional array of integers can recognize the format and process 'nested' arrays as rows (or whatever).


That said, I'm still not convinced that we should allow push/pop-methods on arrays. Perhaps it's too late to change that now. If that's the case, then I feel like we need to change other parts of the language to account for this functionality. Why is it that I can write

var A : [1..0] int;
A.push_back(1);

But I cannot modify the domain through the array?

var A : [1..0] int;
A.domain = {1..1};
A[1] = 1;

It's not the push/pop functionality that bothers me, but that it's inconsistent with other parts of the language.

mppf commented 5 years ago

This issue is arguably even worse for associative arrays. In that case, wouldn't the indices need to be stored with the elements?

mppf commented 5 years ago

issue #11455 is related.

mppf commented 10 months ago

For records containing arrays, having the ability to create an init that deserializes should help significantly with this issue.