Open asmeurer opened 6 days ago
FWIW, from an ideal perspecitive, I still think arr.iter(axis)
or .iteraxis()
would be the best API. (Default could be 0
or None
, or undefined here.)
Once you define __iter__
, unstack
actually really is always just the tuple(arr)
one-liner.
Remmber, that the other argument was e.g. sympy, which doesn't use the list of list analogy and iterates all elements in its Matrix. So the reason for that is, that conceptually there are other choices, and those choices may actually be better where it not for the fact that most users are indoctrinated to the list-of-list view of things.
Well my question here is specifically about the 1-D case. I don't think there is any ambiguity in that case, and based on the scipy changes, it seems to be much more common.
Ah, so allow 1-D iteration on 1-d arrays. Not sure how important it is, but that makes sense to me.
And as you said, just having __getitem__
makes Python already think it should be a sequence/iterable, I guess. So I don't really see a downside to it.
The 1-D limitation may be a bit awkward in practice, so not sure it is a big advantage to promise it works, but it is likely common enough.
Are there examples besides SymPy of linear indexing in Python? I know Matlab does it, too, but I wonder why these matrix-centered implementations should govern what the array API does, given that NumPy, CuPy, PyTorch, JAX, tensorflow, and dask.array seem to agree. Never mind if x[i, ...]
is allowed for multidimensional arrays.
I agree it would be useful to document whether 1-D iteration is supported, explicitly must raise, or is undefined. The most important data point is: do all libraries currently allow 1-D iteration? Would you be able to check @asmeurer?
for the fact that most users are indoctrinated to the list-of-list view of things.
For the record: I don't think this is true, and I don't know of data on how to prove/disprove it either way. There's a lot of users who will think about this as 2-D/3-D regular grids and can visualize it like that (also the case for me), which is much more intuitive for for example physicists than "list of lists".
which is much more intuitive for for example physicists
N-D is intuitive, but the question is what you think when you see for x in arr
, and I think that is the list-of-list style of iteration.
And I have seen a lot of nested for loops over arrays even by users who work with NumPy quite a lot.
So yeah, it is intuitive for physicist. But I still think when it comes down to it, even many of those who find N-D intuitive, will probably reach to the list-of-list analogy when they see a for loop. (Rather than one where you might just iterate all elements because you see it as a collection of elements first, with an N-D structure second.)
PyTorch, jax.numpy, dask.array, and surprisingly even sparse all allow 1-D iteration. They all actually seem to just follow NumPy on n-D iteration (I didn't test CuPy but it's obviously the same as NumPy).
In case it matters, I tested TF last night and it also seems to follow NumPy on n-D iteration.
import tensorflow as tf
x = tf.constant([[1, 2, 3], [4, 5, 6]])
x[0] # <tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3], dtype=int32)>
for row in x:
print(row)
# tf.Tensor([1 2 3], shape=(3,), dtype=int32)
# tf.Tensor([4 5 6], shape=(3,), dtype=int32)
Same with xarray. I couldn't get Weld, Bohrium, Arkouda, or Legate to work on Colab, but ChatGPT tells me that in Weld a 2d array would be a vector of vectors, in Arkouda a 2d array would be a dictionary of 1d arrays, and Legate and Bohrium are supposed to be drop-in replacements for NumPy, so I would expect those to follow the same convention to the extent that multidimensional input is accepted. I didn't test MXNet since the project seems to have been retired.
Recently in array-api-strict, I accidentally disabled iteration on 1-D arrays. This broke a lot of code in SciPy. I've since reverted the change (array-api-strict disallows iteration on >1-D arrays but allows it for 1-D arrays).
There have been discussions in the past about now allowing iteration on arrays https://github.com/data-apis/array-api/issues/188. Disallowing it for higher dimensional arrays is probably fine, but it's unclear whether a library like array-api-strict should disallow it for 1-D arrays. The reason is that technically speaking, an array object that implements on the methods defined in the standard would allow iteration on 1-D arrays. This is because by default if
__iter__
is not defined but__getitem__
is, Python defines iteration asa[0]
,a[1]
, etc.Given how painful this can be for upstream code, I wonder if we should make it explicit in the standard that iteration is defined for 1-D arrays.
A possible counterargument is that the new
unstack
function can be used to iterate on an array of any dimension.unstack(x)
is the same asiter(x)
in the NumPy sense of iteration (it iterates the elements ifx
is 1-dimensional and along the first axis if it is n-dimensional).