Bears-R-Us / arkouda

Arkouda (αρκούδα): Interactive Data Analytics at Supercomputing Scale :bear:
Other
246 stars 89 forks source link

bug in array_api.matmul for arrays of different shapes #3871

Open ajpotts opened 2 weeks ago

ajpotts commented 2 weeks ago

Describe the bug Matrix multiplication does not work for all shapes where it should work. I suspect it's b/c the broadcast_dims function is written to support bin ops which have different broadcasting needs.

To Reproduce

In [2]: import arkouda.array_api as xp
/home/amandapotts/git/arkouda/arkouda/array_api/__init__.py:287: UserWarning: The arkouda.array_api submodule is still experimental.
  warnings.warn("The arkouda.array_api submodule is still experimental.")

In [4]: arrayLeft = xp.asarray(ak.randint(0, 10, (2, 5), dtype="int64"))

In [5]: arrayRight = xp.asarray(ak.randint(0, 10, (5, 3), dtype="int64"))

In [6]: akProduct = xp.matmul(arrayLeft, arrayRight)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/git/arkouda/arkouda/pdarrayclass.py:4000, in broadcast_if_needed(x1, x2)
   3998 try:
   3999     # determine common shape for broadcasting
-> 4000     bc_shape = broadcast_dims(x1.shape, x2.shape)
   4001 except ValueError:

File ~/git/arkouda/arkouda/util.py:393, in broadcast_dims(sa, sb)
    392 else:
--> 393     raise ValueError("Incompatible dimensions for broadcasting")
    395 i -= 1

ValueError: Incompatible dimensions for broadcasting

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 akProduct = xp.matmul(arrayLeft, arrayRight)

File ~/git/arkouda/arkouda/array_api/linalg.py:18, in matmul(x1, x2)
     13 if x1._array.ndim < 2 and x2._array.ndim < 2:
     14     raise ValueError(
     15         "matmul requires at least one array argument to have more than two dimensions"
     16     )
---> 18 x1b, x2b, tmp_x1, tmp_x2 = broadcast_if_needed(x1._array, x2._array)
     20 repMsg = generic_msg(
     21     cmd=f"matMul{len(x1b.shape)}D",
     22     args={
   (...)
     25     },
     26 )
     28 if tmp_x1:

File ~/anaconda3/envs/arkouda-dev/lib/python3.12/site-packages/typeguard/__init__.py:891, in typechecked.<locals>.wrapper(*args, **kwargs)
    889 memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
    890 check_argument_types(memo)
--> 891 retval = func(*args, **kwargs)
    892 check_return_type(retval, memo)
    894 # If a generator is returned, wrap it if its yield/send/return types can be checked

File ~/git/arkouda/arkouda/pdarrayclass.py:4002, in broadcast_if_needed(x1, x2)
   4000     bc_shape = broadcast_dims(x1.shape, x2.shape)
   4001 except ValueError:
-> 4002     raise ValueError(
   4003         f"Incompatible array shapes for broadcasted operation: {x1.shape} and {x2.shape}"
   4004     )
   4006 # broadcast x1 if needed
   4007 if bc_shape != x1.shape:

ValueError: Incompatible array shapes for broadcasted operation: (2, 5) and (5, 3)
drculhane commented 2 weeks ago

Line 21 above issues a call to generic_msg with command = "matMul". I'm puzzled. I don't see this anywhere in the chapel code. There is a matmul in LinalgMsg.chpl, but it's pretty clearly not the same thing.

drculhane commented 2 weeks ago

I think you're right about this broadcast_dims function.

drculhane commented 2 weeks ago

Still looking at this. This website is cited in the code: https://data-apis.org/array-api/latest/API_specification/broadcasting.html#algorithm

It describes the algorithm as being "for the purpose of making arrays with different shapes have compatible shapes for element-wise operations." To the best of my knowledge, experience, etc., that's got nothing to do with matrix multiplication. But I'm still digging.

drculhane commented 2 weeks ago

The answers might be in here: https://data-apis.org/array-api/latest/API_specification/generated/array_api.matmul.html#array_api.matmul

Still digesting.

drculhane commented 2 weeks ago

Bottom line: broadcasting for matmul is different than broadcasting for binops, exactly as you said. I now have python code that appears to broadcast to the right shapes for matmul, but the program still bombs at the point where the python code tries to invoke a chapel command (matMul1D, matMul2D, etc.) which doesn't currently exist. So the next step, I suppose, is to adapt the existing matmul function in LinalgMsg.chpl.