Closed NeilGirdhar closed 8 months ago
Interesting question. The discussion you linked in https://github.com/numpy/numpy/issues/9677 is quite relevant. The NumPy behavior is clearly not great, but @seberg makes a good point:
Frankly, if you want to add such a promise there (which is an issue for the Array API), you better have an idea how to pull of changing it in NumPy. Saying numpy must change it is useless if it can't.
The Array API does not even define a way to mix array objects, interacting with a custom object is undefined to begin with anyway.
As noted in https://data-apis.org/array-api/latest/purpose_and_scope.html, subclassing of an array class is out of scope; an implementation may not even allow this. E.g., JAX doesn't seem to support it at all (https://github.com/google/jax/issues/4269), and in CuPy subclassing is mostly supported since 2022 only (see https://github.com/cupy/cupy/issues/3972#issuecomment-1126011566) and still undocumented as far as I can tell.
So I suspect that the best we can do here is a note that recommends to return NotImplemented
as it's the idiomatic thing to do in Python - but also says that this isn't guaranteed to be the case and notes that this is out of scope for the standard.
Frankly, if you want to add such a promise there (which is an issue for the Array API), you better have an idea how to pull of changing it in NumPy. Saying numpy must change it is useless if it can't.
I guess I'm confused by this wrt the Array API. I understand that it's not possible to change Numpy. What's the problem with making the Array API promise this? Are there implementors that can't do it for some reason? Even NumPy's implementation of the Array API should be able to do it, right?
As noted in https://data-apis.org/array-api/latest/purpose_and_scope.html, subclassing of an array class is out of scope
What do you think of the example I gave of a rectangle class? Should it be possible to write such a class and have it interact normally with the array API?
Even NumPy's implementation of the Array API should be able to it, right?
No, the implementation is in the main namespace from 2.0 onwards - and even if that wasn't the case, it would still want to reuse the ndarray
object, so changing behavior of operators would not really be feasible.
What do you think of the example I gave of a rectangle class? Should it be possible to write such a class and have it interact normally with the array API?
I don't think so, no. It's not just about operator support or subclassing. It'd be very hard to see how to implement any such rectangle object that can be used with regular functions, since those only support array
input and not custom objects.
For every arrays library in existence, custom objects like this are either unsupported or very messy. I don't see any chance of fixing that in this standard.
No, the implementation is in the main namespace from 2.0 onwards
Thank you for explaining. I think I understand now.
- and even if that wasn't the case, it would still want to reuse the
ndarray
object, so changing behavior of operators would not really be feasible.
If it weren't the case, then numpy.xp.ndarray
could reuse numpy.ndarray
by composition though? and then you could forward everything to the composed object—except of course taking care to return NotImplemented
where appropriate. This is what I imagined was happening. I didn't realize that numpy.ndarray
is going to implement the Array API.
Doesn't that mean that for a = numpy.xp.asarray(...)
, then a
has all kinds of extra methods and attributes not specified by the Array API?
So I suspect that the best we can do here is a note that recommends to return NotImplemented as it's the idiomatic thing to do in Python
Right, now I understand this. This would be a nice start—maybe one day, we'll be able to shake off this technical debt.
Doesn't that mean that for
a = numpy.xp.asarray(...)
, thena
has all kinds of extra methods and attributes not specified by the Array API?
Yes, sure. That is perfectly fine though - it will implement a superset, and that doesn't impede the ability to write portable code which targets the standard.
That was not how it is designed for better or worse. Remember, the Array API mainly targets libraries. Those get passed a NumPy array by the user and must return a NumPy array.
So if xp.asarray()
converts to a custom np.xp.ndarray
object, that can work, but you will also need a mechanism like xp.unwrap()
to return the right thing again (and libraries must use it then).
That would be a legitimate design and I do think it would have remove some obstacles. But, it is not the current choice it looks like that is probably not a big headache. For the dataframe API that was never even an option I think, so their choice is very different and it entirely relies on custom objects.
Got it. Thank you @seberg and @rgommers for patiently explaining this to me! I guess I will close this as impossible, but I hope that the recommendation for other libraries ends up somewhere.
It does seem kind of odd to include the __r*__
methods in the spec and not say anything about this. The only point of those methods is to interface with other classes.
Note that for numpy specifically, you can work around this by making __array_ufunc__
return NotImplemented
.
TBH, I'm a little disappointed that nothing was done about this in numpy 2.0 specifically for the case where a non-object array is promoted to an object array. That's a major source of bugs, or at worse, inscrutable error messages in my experience. I know there are some libraries like pandas that use object arrays on purpose, but requiring more explicitness about something becoming an object array would be good thing. It's inline, for instance, with the change to make ragged arrays raise an exception instead of silently becoming an object array.
By the way, this was discussed before at https://github.com/data-apis/array-api/issues/229#issuecomment-881705036
Background
Ordinary NumPy arrays have a peculiar operator behaviour:
__add__
fails, then instead of returningNotImplemented
(which would be idiomatic Python),float
orint
), and then calls the corresponding reverse operator (like__radd__
) on these elements and the RHSIt is too late for NumPy to change this since this would break a lot code.
Motivation
Consider someone implementing a class that must interact with arrays. For example, a rectangle class defined by storing the opposite corners of a rectangle. If
x = Rectange(...)
anda
is an array, we want to support bothx + a
anda + x
. If the decomposition/collation described above happens, then this will not work:a + x
will try to build an array of rectangles offset by floats.Proposal
Conformant implementations of the Array API should return
NotImplemented
when they don't know how to evaluate an operator with their arrays—rather than this behavior being implementation-defined. Also, it would be useful to have a test in the conformance test suite that verifies this.