data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
215 stars 45 forks source link

Support for DLPack exchange of hot buffers #829

Open oleksandr-pavlyk opened 3 months ago

oleksandr-pavlyk commented 3 months ago

Based on discussion in dmlc/dlpack#57, array.__dlpack__ method supports stream keyword, implementing @tqchen's suggestion for synchronization semantics.

The from_dlpack command in array API does not support it at present though, requiring users to explicitly synchronize at the exporter's side.

The from_dlpack function has acquired device keyword (see #626) to permit porting device data to host, e.g. from_dlpack(gpu_arr, device=(kDLCPU, 0)) and, potentially, to allow exchanges between different APIs targeting the same device (e.g., between kDLOneAPI and kDLCUDA for oneAPI devices targeting NVidia GPUs) down the road.

It is tempting to want to add support for stream keyword to from_dlpack, but this keyword value must make sense for importing library, while stream value passed to __dlpack__ must make sense for exporting library. So it seems that from_dlpack should only allow specifying non-default value stream keyword when it can make sense for both. Maybe when device type of requested device is the same as the type indicated in arr.__dlpack_device__.

@kkraus14 @fcharras @leofang @ogrisel

leofang commented 2 months ago

Sorry for late reply @oleksandr-pavlyk. Thinking about this more I am not sure it makes sense to add a stream argument to from_dlpack. Like all other array constructors in the standard, I think we've implicitly assumed (which would have been better to shout out, that I agree) that there's a global, implicit "current stream/queue" to which we submit the workloads. It is this implicit queue that we'd use inside from_dlpack (by passing it as stream to __dlpack__). Do you think we should add a stream argument to all constructors (in addition to already-existing device)?

oleksandr-pavlyk commented 2 months ago

No, I am not advocating for adding stream keywords everywhere, not even to from_dlpack.

I am pointing out though that Array API standard currently leaves those who need to use stream keyword argument in __dlpack__ stranded, as there is no function in the standard to consume the DLPack capsule produced by __dlpack__ call.

I was saying that adding stream keyword to from_dlpack does not really work out, as a non-default value of stream is only pertinent to data interchanges that do not change the device data reside on.

rgommers commented 1 month ago

I think the problem is now clearer to me (thanks @oleksandr-pavlyk and @leofang). The basic problem for the SYCL-based dpctl library is that the return value from __dlpack_device__ is not enough, because SYCL can target multiple device types and hence the return value can be ONE_API and CUDA if the data lives on a CUDA device managed by SYCL.

The device enum was supposed to have mutually exclusive values: data lives either on CPU, or on CUDA, or on ROCm, etc. - but never on two device types at once. Hence I think having ONE_API there was a mistake.

A potential way to handle this would be to return two device values ("this is a SYCL OneAPI array backed by a CUDA device"), and then let the consumer ask for either OneAPI or CUDA (e.g., CuPy would ask for CUDA, while a user of torch.xpu could ask for OneAPI instead).

I think the stream part of the problem statement in the issue description may not be relevant. First, the non-uniqueness of the device enum must be sorted out. A stream value is connected to device type, and stream seems to become confusing here because the device uniqueness is a problem. If we know we're on CUDA, stream should work as designed.

leofang commented 1 month ago

Indeed. Here's the way I understand it, using cupy and dpctl as a concrete example