Closed seberg closed 7 months ago
I will note that this applies to vecdot
. I have to see what other functions we have to be sure if it applies to other functions. In some cases, the axis
might be more like an operation followed by a reduction and not like a core dimensions. For such cases the broadcast specification is fine.
I agree with your general point about not interpreting axis
as after broadcasting. For gufunc
at least, I think we do not have to change anything, as axis
is very much just a shortcut for axes=[axis]*npp
and for axes
it is for each operand separately, so there is no ambiguity.
I think we do not have to change anything
The point is changing the text here. It is a bit unclear, but does seem to suggest to first broadcast. And I suspect that was the intention when it was written.
EDIT: To be a bit clearer:
axis over which to compute the dot product. Must be an integer on the interval [-N, N), where N is the rank (number of dimensions) of the shape determined according to Broadcasting.
Notes "broadcasting", which to me suggests that the axis of the broadcasted arrays is taken much as if we were doing (x * y).sum(axis)
, which is not compatible with NumPy gufuncs.
Ah, I see, yes, that text is inconsistent with the gufunc
implementation. For that text, I agree that axis
may be best defined as always negative - it really does not make any sense to even talk about dot
if the axis
dimension is broadcast (EDIT: which the gufunc definition automatically prevents)
@seberg I believe this issue is related to https://github.com/data-apis/array-api/issues/617, correct?
It is the same issue yes. But to note that this needs resolution (or rather for the sake of NUmPy, I will assume it will be resolved by either aligning with NumPy or unspecifying it).
Looking at this and https://github.com/data-apis/array-api/issues/617 again. I agree with @seberg here. I didn't realize that gufuncs use the axis before broadcasting. If gufuncs are doing this, then that makes this case stronger since the spec is disagreeing with gufunc behavior here. So I agree completely that we should change the spec to only specify axis when it is negative. Users can work around this by manually broadcasting first, but if it really becomes an issue, I would suggest adding axes
(or allowing a list of tuples for axis
), so that the axes of each array can be specified independently.
In short, specifying an axis uniformly for multiple arrays (i.e., axis
as a single integer rather than a tuple or list of tuples of integers), is inherently ambiguous when the axis is nonnegative and the arrays are broadcasted. The most logical behavior (which numpy gufuncs follow) is for the axis to refer to the pre-broadcasted (i.e., actual) array shapes. But the dimensions referred to by axis
are different than the one referred to in the final broadcasted shape. This results in a situation where func(x1, x2)
is not the same as func(*broadcast_arrays(x1, x2))
.
I will also note here that my implementation of vecdot
in array-api-strict (née numpy.array_api) and array-api-compat follows the spec, and therefore is different from the new np.vecdot gufunc:
>>> np.vecdot(np.empty((1, 2, 3, 4, 5)), np.empty((3, 4, 5)), axis=2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: vecdot: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n),(n)->() (size 5 is different from 3)
>>> import numpy.array_api as xp
>>> xp.vecdot(xp.empty((1, 2, 3, 4, 5)), xp.empty((3, 4, 5)), axis=2)
Array([[[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]]], dtype=np.array_api.float64)
(I also apparently never implemented broadcasting for cross
in np.array_api. It works correctly in array-api-compat though because that just wraps np.cross
)
As far as I can tell, this is only an issue for vecdot
and cross
. Those are the only functions in the linalg extension that have an axis argument, other than tensordot
, which does not have this issue because the axes are specified for each array separately already in that function. There are no functions outside of linalg
with more than one argument and an axis
argument. This could in principle apply to more of the linear algebra functions in the future, however. For instance, there's no reason why matmul
couldn't gain an axis
/axes
argument to allow contracting over different dimensions than the last two.
I think it would be good to get this fixed for 2023 if possible. I will make a pull request.
Thanks, that all makes sense. Just for completeness, np.matmul
(like all gufuncs) does have an axes
argument that allows specifying axes for each array separately (pre-broadcasting): https://numpy.org/doc/stable/reference/ufuncs.html
axes
New in version 1.15.
A list of tuples with indices of axes a generalized ufunc should operate on. For instance, for a signature of (i,j),(j,k)->(i,k) appropriate for matrix multiplication, the base elements are two-dimensional matrices and these are taken to be stored in the two last axes of each argument. The corresponding axes keyword would be [(-2, -1), (-2, -1), (-2, -1)]. For simplicity, for generalized ufuncs that operate on 1-dimensional arrays (vectors), a single integer is accepted instead of a single-element tuple, and for generalized ufuncs for which all outputs are scalars, the output tuples can be omitted.
Yes, I rather like that feature, although I'm not sure if it's appropriate for the array API. Maybe if a use-case comes up.
I think swapping axes is fine, for matmul
it does allow for slightly nicer vector-matrix multiplication maybe (but not sure it is really nicer than inserting the dimension).
Overall: Not a particularly important feature, ~swapaxes
~ moveaxis
can get there when needed and it's probably not needed often.
@mhvk noticed here that the definitions here are incompatible with NumPy gufuncs which I think is an oversight that was never noted/discussed. The "after broadcasting" looks reasonable on first sight, but TBH, doesn't really seem reasonable to me:
Assume broadcasting does something, you are specifying an axes that by definition has length 1 along the array, and also by definition has little meaning for the array which gets broadcast.
Now, is the alternative of using the non-broadcast axes better? Maybe not, maybe it is also very unclear and any positive axes (i.e. where broadcasting matters) is just too strange.
But, whether you like what NumPy does or not after thinking about it, it does seem if anything at least (maybe moreso) logical than the alternative.
So, I propose to change any place where broadcasting is mentioned w.r.t. to axes handling to define it as unspecified. Negative axis remain allowed, as it doesn't interfere with broadcasting. Negative axis beyond the number of existing dimensions prior to broadcasting will also be unspecified. The last point looks necessary to me and strengthens the argument for NumPy's (or unspecified/error) logic:
matmul([1], [[2]], axis=(-2, -1))
should not be valid, since it might as well be a bug that the first array is not 2-D as it should be).EDIT: The above example should use
vecdot
, matmul doesn't work:vecdot(1, [2], axis=-1)