indexing operator for multidimensional lists

gavinking commented 6 years ago

Ceylon doesn't have true multidimensional lists, but it is possible to work around that using a list of lists as in Java. This works out just perfectly for multidimensional tuples:

value tup2d = [[1, 2], [2, 3], [0,1]];
Integer elem = tup2d[1][0];

However, for other multidimensional lists (including multidimensional arrays) the stacked indexing operator doesn't work:

value tup2d = [ for (i in 0..n-1) [ for (j in 0..n-1) i==j then 1 else 0 ]];
Integer elem = tup2d[1][0]; //compile error!!

I can see several possible solutions to this:

Make the x[i] operator accept a Correspondence? as its LHS instead of only Correspondence as it does today.
Introduce a multidimensional lookup operator of form x[i,j] for things of type Correspondence<Correspondence<...>>, where x[i,j] means what x[i][j] would mean if it were well-typed.
Figure out a way to tag an Array with its dimensions, for example, Array.ofDimensions(2, 3)(0.0) and introduce a multidimensional lookup operator of form x[i,j] where x[i,j] means x[j+3*i].

Option 1 is of course the simplest change.

Options 1 and 2 work for any kind of Correspondence and support the current pattern of using lists of lists (of lists, etc).

Option 3 recognizes that lists of lists probably aren't really a very efficient way to lay out multidimensional arrays of numbers, but only really works for Arrays. Here, a "2D" array of Floats would be represented as an Array<Float>, but its elements could still be accessed as if they were laid out in two dimensions using arr[i, j].

Thoughts?

arseniiv commented 6 years ago

Writing not much, I had encountered one or two cases where I’d be glad to have ?[] operator with ? like in ?.—exactly what is proposed for [] to change to in the first suggestion, so it would at least simplify these cases as a collateral.

kingjon3377 commented 6 years ago

If the [] operator worked as requested in #4517, one could make a multidimensional-array class whose [] operator always returned a Correspondence, just one that at the final dimension would always return null if any earlier dimension was out-of-bounds. (Though that wouldn't help with the comprehension-based example above.)

gavinking commented 6 years ago

@kingjon3377 Yes that might work, and it's a good idea.

We would still have to introduce this new multidimensional-array class somewhere (or perhaps just two classes, one for 2D arrays, and one for 3D arrays), but perhaps that's a good idea anyway, since it would allow optimized underlying storage and APIs.

For example:

value arr = Array2.create(3, 3, (i, j) => i==j then 1 else 0);
Integer? elem = arr[1][2];
Array row = arr[1]; //non-null

Where arr[1] is non-null because Array2D returns an empty (not null!) Array for out-of-bounds indexes.

Still, I'm not sure I'm really comfortable with those empty Arrays. And I dunno, but I feel like even if we did have these multidimensional-array classes, the [i, j] syntax is still somehow more natural:

value arr = Array2.create(3, 3, (i, j) => i==j then 1 else 0);
Integer? elem = arr[1, 2];
Array? row = arr[1,]; //horizontal slice
Array? col = arr[,2]; //vertical slice

However, TBH, I'm not certain how to abstract over slicing for arbitrary dimensions. One way would be to say that the key is a tuple of indexes, but that would exhibit terrible performance. Another way is something like the previously-described Option 3.

ghost commented 6 years ago

@gavinking, I really don’t like your Array2 idea. It works well for bidimensional arrays, but that’s simply not where people’s necessities end. It’s just not scalable, and I think that having Array2, Array3, etc. up to an arbitrary limit is just as bad of an approach for this issue as Tuple2, Tuple3, etc. is for tuples.

What’s wrong with the quite explicit arr[i]?.get(j) again?

CPColin commented 6 years ago

I dig Option 3 for the use case of needing a perfectly rectangular array (or rectangular prism, hyperprism, etc.) and I like the syntax for indexing into it. I'm not sure what I'd use it for, right now, but it'd be nice to have it available, in case the need came up. (Maybe some image processing or something?)

Mandritti commented 6 years ago

There are two cases. Either the model is a list of lists, or the model is a rectangle (or hyperrectangle). The programmer knows which. As to gifting syntax to the first case, I am reminded of that line from Ceylon's motivational preamble, what was it? "Rather clarity than a sea of argot ASCII" ? So I want to advocate for what Ceylon already has in this example (i.e. the type rules on the x[i] operator).

The want for "clear, concise" syntax, for indexing into a list-of-lists that you know might be an OoB, trades off with the mission to be explicit about those possible Nulls. This is where I like Ceylon for making some things "difficult". I would say "conscious". I also think this is where the solutions are for the application to provide. Again, this list of lists is not rectangular; the application's purpose would decide how much of a fuss should be made over its handling of an OoB, not to mention the semantics of doing so. So let them work through Ceylon's highly regular, strongly typed, rich system to obtain their objective - and they'll get a Ceylon source file for it.

In the hyperrectangle case, I'm imagining a language module artifact that would have the conveniences, maybe even optimization. It's different.

I hope I haven't made myself look too foolish.

OlafTitz commented 6 years ago

Is there anything speaking against option 1? It's simple and natural, and avoids special casing.

luolong commented 6 years ago

I am kind of rooting for option 1 as well—This is more or less what people will try right away and it would behave in a least surprising manner.

gavinking commented 6 years ago

Well, sure, the problem with option 1 is that the performance is potentially much worse.

xkr47 commented 6 years ago

How is OoB detected on the JVM & JS in Ceylon? In traditional Java, an exception is thrown by the JVM. In JS, you get undefined.

I'll remind of a comment in #4517 where a slightly-breaking change with a (defaulted) type argument indicating whether Correspondence.get() is optional or not. Alternatively an OmniCorrespondence subinterface of Correspondence could force the return value of e.g. get() to be non-optional.. :trollface:

Anyway I'm getting wild ideas like^*):

the implementation decides whether it stores values in 1- or n-dimensional storage (in the spirit of LinkedList vs ArrayList)
Array<String, 2> where the number of dimensions is a type argument (think of integers as enumerated instances of Integer).. Correspondence<Key, Item, Absent=Null, Dimensions=1> perhaps? Then the type of get() could be Item(Key[Dimensions]). To get a sub-correspondence with only a prefix of the dimensions given would be accessed through another method which accepts an arbitrary number of dimensions.
- This could enable the user to explicitly specify storage strategy through Array<Array<String,1>,1> vs Array<String,2> although in a very clumsy way :)
[1,2,3] is a bidirectional alias for [1][2][3] or maybe [1]?[2]?[3] (TODO ?[] operator)
[1][2]...[k] given k<n returns a Correspondence|Absent regardless of storage model

SIDE NOTE: where I come from, x[i,j] would resolve to x[j][i] but I am ready to take the plunge!

*) because I'm not yet skilled enough to know it won't work

eclipse-archived / ceylon

indexing operator for multidimensional lists #7344