data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
211 stars 44 forks source link

Can we promise to delegate to reverse operators ? #732

Closed NeilGirdhar closed 8 months ago

NeilGirdhar commented 8 months ago

Background

Ordinary NumPy arrays have a peculiar operator behaviour:

It is too late for NumPy to change this since this would break a lot code.

Motivation

Consider someone implementing a class that must interact with arrays. For example, a rectangle class defined by storing the opposite corners of a rectangle. If x = Rectange(...) and a is an array, we want to support both x + a and a + x. If the decomposition/collation described above happens, then this will not work: a + x will try to build an array of rectangles offset by floats.

Proposal

Conformant implementations of the Array API should return NotImplemented when they don't know how to evaluate an operator with their arrays—rather than this behavior being implementation-defined. Also, it would be useful to have a test in the conformance test suite that verifies this.

rgommers commented 8 months ago

Interesting question. The discussion you linked in https://github.com/numpy/numpy/issues/9677 is quite relevant. The NumPy behavior is clearly not great, but @seberg makes a good point:

Frankly, if you want to add such a promise there (which is an issue for the Array API), you better have an idea how to pull of changing it in NumPy. Saying numpy must change it is useless if it can't.

The Array API does not even define a way to mix array objects, interacting with a custom object is undefined to begin with anyway.

As noted in https://data-apis.org/array-api/latest/purpose_and_scope.html, subclassing of an array class is out of scope; an implementation may not even allow this. E.g., JAX doesn't seem to support it at all (https://github.com/google/jax/issues/4269), and in CuPy subclassing is mostly supported since 2022 only (see https://github.com/cupy/cupy/issues/3972#issuecomment-1126011566) and still undocumented as far as I can tell.

So I suspect that the best we can do here is a note that recommends to return NotImplemented as it's the idiomatic thing to do in Python - but also says that this isn't guaranteed to be the case and notes that this is out of scope for the standard.

NeilGirdhar commented 8 months ago

Frankly, if you want to add such a promise there (which is an issue for the Array API), you better have an idea how to pull of changing it in NumPy. Saying numpy must change it is useless if it can't.

I guess I'm confused by this wrt the Array API. I understand that it's not possible to change Numpy. What's the problem with making the Array API promise this? Are there implementors that can't do it for some reason? Even NumPy's implementation of the Array API should be able to do it, right?

As noted in https://data-apis.org/array-api/latest/purpose_and_scope.html, subclassing of an array class is out of scope

What do you think of the example I gave of a rectangle class? Should it be possible to write such a class and have it interact normally with the array API?

rgommers commented 8 months ago

Even NumPy's implementation of the Array API should be able to it, right?

No, the implementation is in the main namespace from 2.0 onwards - and even if that wasn't the case, it would still want to reuse the ndarray object, so changing behavior of operators would not really be feasible.

What do you think of the example I gave of a rectangle class? Should it be possible to write such a class and have it interact normally with the array API?

I don't think so, no. It's not just about operator support or subclassing. It'd be very hard to see how to implement any such rectangle object that can be used with regular functions, since those only support array input and not custom objects.

For every arrays library in existence, custom objects like this are either unsupported or very messy. I don't see any chance of fixing that in this standard.

NeilGirdhar commented 8 months ago

No, the implementation is in the main namespace from 2.0 onwards

Thank you for explaining. I think I understand now.

  • and even if that wasn't the case, it would still want to reuse the ndarray object, so changing behavior of operators would not really be feasible.

If it weren't the case, then numpy.xp.ndarray could reuse numpy.ndarray by composition though? and then you could forward everything to the composed object—except of course taking care to return NotImplemented where appropriate. This is what I imagined was happening. I didn't realize that numpy.ndarray is going to implement the Array API.

Doesn't that mean that for a = numpy.xp.asarray(...), then a has all kinds of extra methods and attributes not specified by the Array API?

So I suspect that the best we can do here is a note that recommends to return NotImplemented as it's the idiomatic thing to do in Python

Right, now I understand this. This would be a nice start—maybe one day, we'll be able to shake off this technical debt.

rgommers commented 8 months ago

Doesn't that mean that for a = numpy.xp.asarray(...), then a has all kinds of extra methods and attributes not specified by the Array API?

Yes, sure. That is perfectly fine though - it will implement a superset, and that doesn't impede the ability to write portable code which targets the standard.

seberg commented 8 months ago

That was not how it is designed for better or worse. Remember, the Array API mainly targets libraries. Those get passed a NumPy array by the user and must return a NumPy array. So if xp.asarray() converts to a custom np.xp.ndarray object, that can work, but you will also need a mechanism like xp.unwrap() to return the right thing again (and libraries must use it then).

That would be a legitimate design and I do think it would have remove some obstacles. But, it is not the current choice it looks like that is probably not a big headache. For the dataframe API that was never even an option I think, so their choice is very different and it entirely relies on custom objects.

NeilGirdhar commented 8 months ago

Got it. Thank you @seberg and @rgommers for patiently explaining this to me! I guess I will close this as impossible, but I hope that the recommendation for other libraries ends up somewhere.

asmeurer commented 8 months ago

It does seem kind of odd to include the __r*__ methods in the spec and not say anything about this. The only point of those methods is to interface with other classes.

Note that for numpy specifically, you can work around this by making __array_ufunc__ return NotImplemented.

TBH, I'm a little disappointed that nothing was done about this in numpy 2.0 specifically for the case where a non-object array is promoted to an object array. That's a major source of bugs, or at worse, inscrutable error messages in my experience. I know there are some libraries like pandas that use object arrays on purpose, but requiring more explicitness about something becoming an object array would be good thing. It's inline, for instance, with the change to make ragged arrays raise an exception instead of silently becoming an object array.

asmeurer commented 8 months ago

By the way, this was discussed before at https://github.com/data-apis/array-api/issues/229#issuecomment-881705036