I am not able to dig deep into this right now, but there are a couple of additional potential hurdles for the CUDA target:
The CUDA Ufunc mechanism doesn't yet support dynamic Ufuncs, which seem to be a part of the implementation of the PR for the CPU target.
The __array_ufunc__ mechanism seems CPU-centric - will we need to define a new __cuda_array_ufunc__ mechanism for this to be practical? c.f. __cuda_array_interface__ vs. __array_interface__.
Details / notes:
From Allow libraries that implement __array_ufunc__ to override CUDAUFuncDispatcher on the Numba Discourse.
There was a PR implementing this for the CPU target: https://github.com/numba/numba/pull/8995
A related issue on the Awkward issue tracker: https://github.com/scikit-hep/awkward/issues/3179
This is to support using Coffea on CUDA.
cc @ianna @lgray
Initial thoughts:
I am not able to dig deep into this right now, but there are a couple of additional potential hurdles for the CUDA target:
__array_ufunc__
mechanism seems CPU-centric - will we need to define a new__cuda_array_ufunc__
mechanism for this to be practical? c.f.__cuda_array_interface__
vs.__array_interface__
.