IntelPython / dpctl

Python SYCL bindings and SYCL-based Python Array API library
https://intelpython.github.io/dpctl/
Apache License 2.0
101 stars 30 forks source link

__dlpack_device__() returned numbers #1397

Closed wozna closed 5 months ago

wozna commented 1 year ago

Hi, I have question about dlpack results. I created dpnp array, then checked __dlpackdevice_\() and got DLDeviceType=14(kDLOneAPI) and device_id =3. Could you help me understand what this 3 means? Because when I run sycl-ls I get the output:

[opencl:cpu:0] Intel(R) OpenCL, Intel(R) Xeon(R) Gold [...]
[opencl:acc:1] Intel(R) FPGA Emulation Platform for OpenCL(TM) [...]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max [...]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max [...]

When I checked values for dltensor it shows level_zero:gpu:0 image

Here code example

import dpnp as dnp

if __name__ == "__main__":
    first_number, second_number = dnp.arange(100, dtype=dnp.float32).__dlpack_device__()
    print(first_number) # result 14 
    print(second_number) # result 3 
oleksandr-pavlyk commented 1 year ago

@wozna The tuple returned by usm_ndarray.__dlpack_device__ corresponds to the (accelerator/framework identifier, and device_id).

The framework identifier is 14 (enumerator kDLOneAPI) as you have realized, and the device_id is the stable numeric ordinal encoding of the root (unpartitioned) device consistent with SYCL-RT. It corresponds to the position of the device in the device vector returned by static method sycl::device::get_devices(), and exposed to Python as dpctl.get_devices(). Filter selector string consisting of just this identifier reconstructs the unpartitioned SYCL device:

In [1]: import dpctl.tensor as dpt

In [2]: x = dpt.arange(10, dtype="f4")

In [3]: x.device
Out[3]: Device(level_zero:gpu:0)

In [4]: x.__dlpack_device__()
Out[4]: (14, 2)

In [5]: import dpctl

In [6]: x.sycl_device == dpctl.SyclDevice("2")
Out[6]: True

In [7]: x.sycl_device == dpctl.get_devices()[2]
Out[7]: True
oleksandr-pavlyk commented 1 year ago

@wozna Let me know if you have further questions. Feel free to resolve if not.

wozna commented 1 year ago

@oleksandr-pavlyk Thank you for the answer. So if I have dltensor, only by calling sycl I can find out on which machine tensor is allocated (cpu or xpu) by comparing device_id with sycl::device::get_devices()?

oleksandr-pavlyk commented 1 year ago

@wozna Yes, that is correct. Handling kDLOneAPI device requires a call to SYCL runtime

wozna commented 1 year ago

@oleksandr-pavlyk Now it is clear to me, thank you.

wozna commented 1 year ago

I have one more question about xpu tiles in case of dlpack. Because in dltensor we have info about device_id which tell us only on which device memory is allocated, not on which tile. So if we have data pointer in dltensor, how do we know on which tile is it? Do we have to know it if we want to implement zero-copy from_dlpack or to_dlpack?

oleksandr-pavlyk commented 1 year ago

Great question @wozna. It is possible to share tile allocations made using the default-platform context.

Steps for exporting DLPack for tile allocated memory:

Step for importing DLPack:

This logic is implemented in dpctl's support for DLPack.

oleksandr-pavlyk commented 5 months ago

@wozna Is this ticket ready to be resolved?

wozna commented 5 months ago

@oleksandr-pavlyk Yes it can be resolved, thank you for your answers.