Remove "use axis after broadcasting" and change it or unspecify it

seberg commented 9 months ago

@mhvk noticed here that the definitions here are incompatible with NumPy gufuncs which I think is an oversight that was never noted/discussed. The "after broadcasting" looks reasonable on first sight, but TBH, doesn't really seem reasonable to me:

Assume broadcasting does something, you are specifying an axes that by definition has length 1 along the array, and also by definition has little meaning for the array which gets broadcast.

Now, is the alternative of using the non-broadcast axes better? Maybe not, maybe it is also very unclear and any positive axes (i.e. where broadcasting matters) is just too strange.

But, whether you like what NumPy does or not after thinking about it, it does seem if anything at least (maybe moreso) logical than the alternative.

So, I propose to change any place where broadcasting is mentioned w.r.t. to axes handling to define it as unspecified. Negative axis remain allowed, as it doesn't interfere with broadcasting. Negative axis beyond the number of existing dimensions prior to broadcasting will also be unspecified. The last point looks necessary to me and strengthens the argument for NumPy's (or unspecified/error) logic: matmul([1], [[2]], axis=(-2, -1)) should not be valid, since it might as well be a bug that the first array is not 2-D as it should be).

EDIT: The above example should use vecdot, matmul doesn't work: vecdot(1, [2], axis=-1)

seberg commented 9 months ago

I will note that this applies to vecdot. I have to see what other functions we have to be sure if it applies to other functions. In some cases, the axis might be more like an operation followed by a reduction and not like a core dimensions. For such cases the broadcast specification is fine.

mhvk commented 9 months ago

I agree with your general point about not interpreting axis as after broadcasting. For gufunc at least, I think we do not have to change anything, as axis is very much just a shortcut for axes=[axis]*npp and for axes it is for each operand separately, so there is no ambiguity.

seberg commented 9 months ago

I think we do not have to change anything

The point is changing the text here. It is a bit unclear, but does seem to suggest to first broadcast. And I suspect that was the intention when it was written.

EDIT: To be a bit clearer:

axis over which to compute the dot product. Must be an integer on the interval [-N, N), where N is the rank (number of dimensions) of the shape determined according to Broadcasting.

Notes "broadcasting", which to me suggests that the axis of the broadcasted arrays is taken much as if we were doing (x * y).sum(axis), which is not compatible with NumPy gufuncs.

mhvk commented 9 months ago

Ah, I see, yes, that text is inconsistent with the gufunc implementation. For that text, I agree that axis may be best defined as always negative - it really does not make any sense to even talk about dot if the axis dimension is broadcast (EDIT: which the gufunc definition automatically prevents)

kgryte commented 9 months ago

@seberg I believe this issue is related to https://github.com/data-apis/array-api/issues/617, correct?

seberg commented 9 months ago

It is the same issue yes. But to note that this needs resolution (or rather for the sake of NUmPy, I will assume it will be resolved by either aligning with NumPy or unspecifying it).

asmeurer commented 8 months ago

Looking at this and https://github.com/data-apis/array-api/issues/617 again. I agree with @seberg here. I didn't realize that gufuncs use the axis before broadcasting. If gufuncs are doing this, then that makes this case stronger since the spec is disagreeing with gufunc behavior here. So I agree completely that we should change the spec to only specify axis when it is negative. Users can work around this by manually broadcasting first, but if it really becomes an issue, I would suggest adding axes (or allowing a list of tuples for axis), so that the axes of each array can be specified independently.

In short, specifying an axis uniformly for multiple arrays (i.e., axis as a single integer rather than a tuple or list of tuples of integers), is inherently ambiguous when the axis is nonnegative and the arrays are broadcasted. The most logical behavior (which numpy gufuncs follow) is for the axis to refer to the pre-broadcasted (i.e., actual) array shapes. But the dimensions referred to by axis are different than the one referred to in the final broadcasted shape. This results in a situation where func(x1, x2) is not the same as func(*broadcast_arrays(x1, x2)).

I will also note here that my implementation of vecdot in array-api-strict (née numpy.array_api) and array-api-compat follows the spec, and therefore is different from the new np.vecdot gufunc:

>>> np.vecdot(np.empty((1, 2, 3, 4, 5)), np.empty((3, 4, 5)), axis=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: vecdot: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n),(n)->() (size 5 is different from 3)
>>> import numpy.array_api as xp
>>> xp.vecdot(xp.empty((1, 2, 3, 4, 5)), xp.empty((3, 4, 5)), axis=2)
Array([[[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]]], dtype=np.array_api.float64)

(I also apparently never implemented broadcasting for cross in np.array_api. It works correctly in array-api-compat though because that just wraps np.cross)

As far as I can tell, this is only an issue for vecdot and cross. Those are the only functions in the linalg extension that have an axis argument, other than tensordot, which does not have this issue because the axes are specified for each array separately already in that function. There are no functions outside of linalg with more than one argument and an axis argument. This could in principle apply to more of the linear algebra functions in the future, however. For instance, there's no reason why matmul couldn't gain an axis/axes argument to allow contracting over different dimensions than the last two.

I think it would be good to get this fixed for 2023 if possible. I will make a pull request.

asmeurer commented 8 months ago

Fix at https://github.com/data-apis/array-api/issues/740

mhvk commented 8 months ago

Thanks, that all makes sense. Just for completeness, np.matmul (like all gufuncs) does have an axes argument that allows specifying axes for each array separately (pre-broadcasting): https://numpy.org/doc/stable/reference/ufuncs.html

axes

New in version 1.15.

A list of tuples with indices of axes a generalized ufunc should operate on. For instance, for a signature of (i,j),(j,k)->(i,k) appropriate for matrix multiplication, the base elements are two-dimensional matrices and these are taken to be stored in the two last axes of each argument. The corresponding axes keyword would be [(-2, -1), (-2, -1), (-2, -1)]. For simplicity, for generalized ufuncs that operate on 1-dimensional arrays (vectors), a single integer is accepted instead of a single-element tuple, and for generalized ufuncs for which all outputs are scalars, the output tuples can be omitted.

asmeurer commented 8 months ago

Yes, I rather like that feature, although I'm not sure if it's appropriate for the array API. Maybe if a use-case comes up.

seberg commented 8 months ago

I think swapping axes is fine, for matmul it does allow for slightly nicer vector-matrix multiplication maybe (but not sure it is really nicer than inserting the dimension). Overall: Not a particularly important feature, ~swapaxes~ moveaxis can get there when needed and it's probably not needed often.

data-apis / array-api

Remove "use axis after broadcasting" and change it or unspecify it #724