Open oleksandr-pavlyk opened 3 months ago
Sorry for late reply @oleksandr-pavlyk. Thinking about this more I am not sure it makes sense to add a stream
argument to from_dlpack
. Like all other array constructors in the standard, I think we've implicitly assumed (which would have been better to shout out, that I agree) that there's a global, implicit "current stream/queue" to which we submit the workloads. It is this implicit queue that we'd use inside from_dlpack
(by passing it as stream
to __dlpack__
). Do you think we should add a stream
argument to all constructors (in addition to already-existing device
)?
No, I am not advocating for adding stream
keywords everywhere, not even to from_dlpack
.
I am pointing out though that Array API standard currently leaves those who need to use stream
keyword argument in __dlpack__
stranded, as there is no function in the standard to consume the DLPack capsule produced by __dlpack__
call.
I was saying that adding stream
keyword to from_dlpack
does not really work out, as a non-default value of stream
is only pertinent to data interchanges that do not change the device data reside on.
I think the problem is now clearer to me (thanks @oleksandr-pavlyk and @leofang). The basic problem for the SYCL-based dpctl
library is that the return value from __dlpack_device__
is not enough, because SYCL can target multiple device types and hence the return value can be ONE_API
and CUDA
if the data lives on a CUDA device managed by SYCL.
The device enum was supposed to have mutually exclusive values: data lives either on CPU, or on CUDA, or on ROCm, etc. - but never on two device types at once. Hence I think having ONE_API
there was a mistake.
A potential way to handle this would be to return two device values ("this is a SYCL OneAPI array backed by a CUDA device"), and then let the consumer ask for either OneAPI or CUDA (e.g., CuPy would ask for CUDA, while a user of torch.xpu
could ask for OneAPI instead).
I think the stream
part of the problem statement in the issue description may not be relevant. First, the non-uniqueness of the device enum must be sorted out. A stream value is connected to device type, and stream seems to become confusing here because the device uniqueness is a problem. If we know we're on CUDA, stream
should work as designed.
Indeed. Here's the way I understand it, using cupy
and dpctl
as a concrete example
cupy.from_dlpack(dpctl.tensor.usm_ndarray)
:
cupy
is consumer, dpctl
is producerdpctl.tensor.__dlpack_device__
is queried, returning kDLOneAPI
and an (opaque?) device IDcupy
does not know how to interpret kDLOneAPI
(even if it's backed by CUDA), so it raises a BufferError
dpctl
users ❌ dpctl.tensor.from_dlpack(cupy.ndarray)
:
cupy
is producer, dpctl
is consumercupy.ndarray.__dlpack_device__
is queried, returning kDLCUDA
and a device IDdlctl
can query the OneAPI runtime and see if CUDA is in use under the hood
BufferError
✅ dpctl
pass a meaningful stream
integer to cupy.ndarray.__dlpack__
to handshake, establish stream order, and get a capsule ✅
Based on discussion in dmlc/dlpack#57,
array.__dlpack__
method supportsstream
keyword, implementing @tqchen's suggestion for synchronization semantics.The
from_dlpack
command in array API does not support it at present though, requiring users to explicitly synchronize at the exporter's side.The
from_dlpack
function has acquireddevice
keyword (see #626) to permit porting device data to host, e.g.from_dlpack(gpu_arr, device=(kDLCPU, 0))
and, potentially, to allow exchanges between different APIs targeting the same device (e.g., betweenkDLOneAPI
andkDLCUDA
for oneAPI devices targeting NVidia GPUs) down the road.It is tempting to want to add support for
stream
keyword tofrom_dlpack
, but this keyword value must make sense for importing library, whilestream
value passed to__dlpack__
must make sense for exporting library. So it seems thatfrom_dlpack
should only allow specifying non-default valuestream
keyword when it can make sense for both. Maybe when device type of requested device is the same as the type indicated inarr.__dlpack_device__
.@kkraus14 @fcharras @leofang @ogrisel