HaxeFoundation / haxe-evolution

Repository for maintaining proposal for changes to the Haxe programming language
111 stars 58 forks source link

Multiple argument array access #72

Closed ErikRikoo closed 2 years ago

ErikRikoo commented 4 years ago

Changes to the compiler to allow array[a, b, ....] Rendered Version

RealyUniqueName commented 4 years ago

Motivation seems weak to me. Please, provide some use cases.

RealyUniqueName commented 4 years ago

The problem described in motivation section is quite easy to solve with a simple function:

function getArrayArgs(e:Expr):Array<Expr> {
    switch e {
        case macro $arr[$arg]:
            var args = getArrayArgs(arr);
            args.unshift(arg);
            return args;
        case _:
            return [];
    }
}

It will collect array arguments of whatever dimensions:

getArrayArgs(macro a[1][2][3]).length; // 3
jdonaldson commented 4 years ago

I really like the idea for this. Python provides an easy multidimensional selection scheme for its list type, and this technique enables easier scripting control of natively defined n-dimensional array implementations (typically in C/Fortran). From the "NDArray" implementations, you can put together higher order datastructures like dataframes, tensors, etc. that behave consistently across platforms (especially since they most likely interact with the same low level native C libraries).

All of these related datastructures take advantage of low level SIMD style calculations (single instruction, multiple data). This technique enables python code to take full advantage of modern processors in a data-intensive pipeline (rather than relying on python's slow for-loops, etc). As core counts increase, these SIMD techniques are likely to become increasingly common, even if you're not working with GPUs or in data-heavy domains.

ErikRikoo commented 4 years ago

Just to stimulate the discussion, I may allow myself to tag members of the compiler. So what do you think of it? @ncannasse @Simn @hughsando @waneck @andyli @nadako @RealyUniqueName @jdonaldson Thanks for your contribution

nadako commented 4 years ago

I can see myself using such syntax for 2D grids and such, if it is also supported by operator overloading. One thing I'm a bit concerned about is if "reserving" comma like this could block some other potential syntax, like tuples, but it doesn't seem like we're gonna have them anytime soon, so I don't know...

ErikRikoo commented 4 years ago

@nadako Do you think I should extend my proposal to operator overloading? And I think there was a solid against for tuples

Simn commented 4 years ago

I have nothing against the syntax itself, but in my opinion it's simply not worth the complications for Expr. If the focal point here is macros, then the difference between base[a, b] and base(a, b) is purely aesthetic. So unless there's a really convincing use-case, I'm a no-vote on this one.

ErikRikoo commented 4 years ago

I will update my proposal to extend operator overloading I think :)

hughsando commented 4 years ago

The python power comes from having a single array of 2D (or N-D) memory, but in haxe, a 2D array is an array of arrays. This is more like a list-of-lists in python. Each row of the 2D array can have different lengths. So implementing this with actual arrays would not be very useful. The developer should be able to choose from and array-of-arrays, a single array with a fixed(compile-time) width (stride) or a single array with runtime a width somehow stored along with the array pointer.

I actually have a tensor class, and use this solution in haxe:

   public function get(idx0:Int,?idx1:Int, ?idx2:Int, ?idx3:Int, ?idx4:Int):Float
   {
      return tdGetAt(this, idx0, idx1, idx2, idx3, idx4);
   }

   @:op([])
   public function at(inIndex:Int):Float
   {
      return tdAt(this, inIndex);
   }

I might be able to do something with 'rest', but a series of "[]" overloads would be much more satisfying.

So basically, I see this as an operator overload solution, not an array implementation solution.

ErikRikoo commented 4 years ago

@hughsando this proposal was not talking about a rework of the haxe array implementation, but more about [] operator overloading at least during macro

ncannasse commented 4 years ago

I'm against adding more syntax when there are already good alternatives. This makes things much more complicated in the whole compiler just to be able to write a[0,1] instead of a(0,1) which is not a good enough reason imho.

ErikRikoo commented 4 years ago

@ncannasse what about operator overloading? Then, it allows then to directly write a[0, 1] = ...? I will rewrite the proposal to extend it to operator overload

ErikRikoo commented 4 years ago

I updated the syntax to operator overloading to fit the comments given. @ncannasse @Simn @hughsando @waneck @andyli @nadako @RealyUniqueName @jdonaldson What do you think of it now? Thanks for your contribution

ibilon commented 4 years ago

At runtime we need to add a lot more logic or return proxy objects to be able to use this syntax. It adds complexity and overhead to the code.

If those object have an inline new they'll get inlined and won't increase the runtime overhead.

ErikRikoo commented 4 years ago

@ibilon Is inline forced or the compiler may decide to not inline it? But I update it :)

RealyUniqueName commented 4 years ago

It would be good if Haxe had this syntax from the beginning or if it was added in a major version change. But now it seems to be a big change for a small benefit.

Aurel300 commented 4 years ago

I mostly agree with @RealyUniqueName about this being too large of a change before a major version. But we could label this as "next-major" or something.

Re: the operator overload syntax. If you consider Python numpy arrays, an important feature is that the indexing or any given N-dimensional array accepts any number of indices (less than or equal to N).

>>> import numpy as np
>>> arr = np.array([ i for i in range(2 ** 4) ])
>>> narr = arr.reshape(2, 2, 2, 2)
>>> narr[0]
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])
>>> narr[0, 0]
array([[0, 1],
       [2, 3]])
>>> narr[0, 0, 0]
array([0, 1])
>>> narr[0, 0, 0, 0]
0
>>> narr[0, 0, 0, 0, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array
>>> type(narr)
<class 'numpy.ndarray'>

And of course, this is determined at runtime, since to Python, the type does not carry information about its dimensionality.

Coming back to Haxe, the only currently-sane way of supporting this feature would be to allow the hypothetical operator overload to take the indices in the form of an array, rather than sequential arguments. Just like the ExprDef you propose. It also means you can't really restrict such operator overloads to take arguments with particular types, either it is indices:Array<Any> for any type of index, or indices:Array<Int> for Int indexing only.

One more reason that the overload syntax without allowing arbitrary argument counts is the fact that you can do this relatively easily with abstracts for a known dimensionality:

class Main {
  public static function main():Void {
    var arr = new A2D(3, 2);
    for (y in 0...2) for (x in 0...3)
      arr[y][x] = '$x,$y';
    trace(arr[1][1]);
  }
}

@:forward(width, data)
abstract A2D<T>({width:Int, data:Array<T>}) from {width:Int, data:Array<T>} {
  public inline function new(width:Int, height:Int) {
    var data = [];
    data.resize(width * height);
    this = {width: width, data: data};
  }

  @:arrayAccess inline function getRow(y:Int):A2DRow<T>
    return new A2DRow(y, this);
}

abstract A2DRow<T>({y:Int, a:A2D<T>}) from {y:Int, a:A2D<T>} {
  public inline function new(y:Int, a:A2D<T>)
    this = {y: y, a: a};

  @:arrayAccess inline function get(x:Int):T
    return this.a.data[this.a.width * this.y + x];

  @:arrayAccess inline function set(x:Int, v:T):T
    return this.a.data[this.a.width * this.y + x] = v;
}

(Of course you could have more error checking, or nest the abstract deeper, or generate any level of dimensionality with macros.)

With -D analyzer-optimize, the generated JS for the code in Main turns into:

    var data = [];
    data.length = 6;
    data[0] = "" + 0 + "," + 0;
    data[1] = "" + 1 + "," + 0;
    data[2] = "" + 2 + "," + 0;
    data[3] = "" + 0 + "," + 1;
    data[4] = "" + 1 + "," + 1;
    data[5] = "" + 2 + "," + 1;
    console.log("Main.hx:6:",data[4]);
hughsando commented 4 years ago

I'm a big fan of operator overloading to allow [y,x] = ... as a more logical .set(x,y, ...) for array-like objects, but not a fan of binding this specifically to arrays-of-arrays, since this prevents it from being used for things such as images.

ErikRikoo commented 4 years ago

I think it will not be used for array of arrays, as we use an access to get an array and then get an element from it so it would be two array access.

jdonaldson commented 4 years ago

I mostly agree with @RealyUniqueName about this being too large of a change before a major version. But we could label this as "next-major" or something.

Yeah, this would be my vote as well, there's a lot of technical gaps that would need to be overcome in order for this feature to feel "first class". I don't think it could/should be done on a minor revision.

Re: the operator overload syntax. If you consider Python numpy arrays, an important feature is that the indexing or any given N-dimensional array accepts any number of indices (less than or equal to N).

Yes, and there's other special "ranged" indices as well (e.g. arr[1:3], arr[1:3,1:]). A first class haxe implementation would support indexing by ranges: bounded or unbounded.

And of course, this is determined at runtime, since to Python, the type does not carry information about its dimensionality.

Yeah, and Haxe should strive to be much better than Python here. To me, this means we would need to support dependent types that encode dimensionality.

Coming back to Haxe, the only currently-sane way of supporting this feature would be to allow the hypothetical operator overload to take the indices in the form of an array, rather than sequential arguments. Just like the ExprDef you propose. It also means you can't really restrict such operator overloads to take arguments with particular types, either it is indices:Array<Any> for any type of index, or indices:Array<Int> for Int indexing only.

Yeah, and for this reason I think array-based indices are the wrong approach. We would need much better "Rest" syntax support, and first class tuple support would be very helpful as well.

One more reason that the overload syntax without allowing arbitrary argument counts is the fact that you can do this relatively easily with abstracts for a known dimensionality:

Yeah, abstracts are the way to go here. It limits their flexibility as a type, but I think that's a sound tradeoff for a datastructure designed for raw speed/parallelism.

ErikRikoo commented 4 years ago

@ncannasse @Simn @waneck @andyli @nadako Any opinion now that I updated the PR?

ncannasse commented 4 years ago

That still seems too weak to me. Adding a new syntax just to be able to do things you could already do otherwise is not enough of a motivation for me to approve.

ErikRikoo commented 4 years ago

@ncannasse is this a veto or should I request a vote?

grepsuzette commented 4 years ago

@erikRikoo Please don’t press too quickly to get it downright accepted or rejected. I mean it is an interesting idea and I don’t thing everything has been said yet; if it gets closed by you or another person we will then all lose ability to add anything. I feel the idea deserves more discussion or at least chances for more discussions, even if it’s likely not for a next minor version.

grepsuzette commented 4 years ago

From my end I have felt the need to have something like that for monad-like access (where type is Option<T>) within an abstract. But I have not pondered yet whether it could apply or not and i don’t remember the details, have to dig it out and see what is was exactly.

ErikRikoo commented 4 years ago

@emugel Yeah, it is true But for now I may admit that I have no idea on something to add ^^' but anyone feel free to suggest ;)

Simn commented 4 years ago

We didn't reach a consensus on this one in our haxe-evolution meeting yesterday.

@hughsando and @jdonaldson presented some valid use-cases. We will have to investigate how tricky this change would be with regards to our expression representation.

Simn commented 2 years ago

lean-reject: questionable benefit

Simn commented 2 years ago

We have rejected the proposal in the haxe-evolution meeting today.

We are cautious about adding additional syntax that has no meaning in the language itself. Such an addition would only be made if there's a clear benefit of the proposed syntax, which we don't see here.