OpenCL make_device ownership of native device should be uniqued?

JackAKirk commented 2 years ago

https://github.com/intel/llvm/issues/6055 raised the issue that some implementations may allow make_device to return a duplicate sycl::device if there is already a sycl::device owning the specified native device provided as an argument to make_device. One implication of this is that there can be multiple sycl::devices that independently reference count a single native device and therefore there can be multiple independent sycl::device responsible for destruction of a single native device. A draft "fix" for this is here : https://github.com/intel/llvm/pull/6204.

Assuming that sycl::device is stateless in every backend the request made in https://github.com/intel/llvm/issues/6055 that sycl::devices be "uniqued" seems reasonable to me. However I don't see anything covering this in either "4.5.1. Backend interoperability" section of the SYCL 2020 rev 5 spec, or in Appendix "C.7.1. Construct SYCL objects from OpenCL ones".

I would like to clarify which of the following is true (unless there is another option)?

Behaviour of make_device is implementation defined with regard to "uniqueing" devices. (This is the current state of affairs if we assume that if something isn't mentioned in the spec it is implementation defined).
make_device should return an existing sycl::device if one already exists with the specified native device.
make_device should return a new (non "uniqued") sycl::device even if one already exists with the specified native device.

Thanks

keryell commented 2 years ago

I would prefer solution 2 since SYCL is about simplifying heterogeneous programming. This adds some latency in the construction but I hope a real application would not use a spam of sycl::make_device. By the way, https://github.com/intel/llvm/issues/6055 was created by @Ralender who works on triSYCL. Just to be sure he gets pinged to read this and see there is a follow-up. :-)

AerialMantis commented 2 years ago

I agree with @keryell, I think there is a precedent for this in favour of option 2.

In section 4.6.4 we state:

The execution environment for a SYCL application has a fixed number of devices which does not vary as the application executes. The application can get a list of all these devices via device::get_devices(). The device class also provides constructors, but constructing a new device instance merely creates a new object that is a copy of one of the objects returned by device::get_devices().

This clarification was added along with a similar wording in section 4.6.2 to enforce that SYCL exposes a fix platform/device topology which is consistent and deterministic throughout the execution of the application, such that creating a new platform or device object simply creates a copy of one of the existing objects.

I think this same logic should be applied to make_platform and make_device to enforce this behaviour, so that these functions would return a copy of an existing platform or device for which the created object represents the same native backend object. The behaviour of this would be more like a lookup to find the platform or device which matches the native object provided. Then as the the new platform or device would be a copy those in the topology just as with any that are created the same common reference semantics rules would apply.

One question with this I see though is what happens if there is no platform or device in the topology that corresponds to the native backend object, would this still return an object that is not a copy of one in the topology, would that make sense, or should this through an exception?

TApplencourt commented 2 years ago

make_device should return an existing sycl::device if one already exists with the specified native device.

I think it's not yet a problem as it's not really clear in the spec right now, but implementation may map sycl::devices to a poll of native devices (M:N mapping). In this case what should be the behavior? As he can have multiple sycl::device corresponding to a particular native device. I guess we can always require the implementation to choose an arbitrary one and to always return the same.

In short, we don't have a bijection between sycl::device and native::device and this raises a bunch of questions everywhere in the interopts land

JackAKirk commented 2 years ago

make_device should return an existing sycl::device if one already exists with the specified native device.

I think it's not yet a problem as it's not really clear in the spec right now, but implementation may map sycl::devices to a pull of native devices (M:N mapping). In this case what should be the behavior? As he can have multiple sycl::device corresponding to this particular native device.

In short, we don't have a bijection between sycl::device and native::device and this raises a bunch of questions everywhere.

If there are some backends that have m:n mappings between sycl device and native device (is there an example of a backend where there may not be a bijection between sycl device and native device?) then either:

a) Behaviour of make_device is implementation defined with regard to "uniqueing" devices. (This is the current state of affairs if we assume that if something isn't mentioned in the spec it is implementation defined).

b) specify in the spec that sycl devices will only be uniqued if they only comprise a single native_device (assuming that make_device can only take a single native_device: if the spec is changed so that make_device can take a list of native_devices then require make_device returns an existing sycl device when the list of native_devices provided is a bijection of the list of native_devices contained within an existing sycl device).

c) Some other option I don't see?

TApplencourt commented 2 years ago

(is there an example of a backend where there may not be a bijection between sycl device and native device?)

My understanding is that intel/llvm with CUDA backend now maps multiple streams to one sycl::device (only solution to implement efficient out-of-order queue). HipSYCL does that same. And maybe most of the application's map will subdevice to the same native backend (I didn't check but that seems to be the most natural way to implement it...)

Exactly, right now the spec assumes a single native_device. So I think we can do the same here. But if one day we change the day to support this M:N mapping, we should not forget to think about the implications of this change on this issue of uniqueness.

May c can just be make_device is a "pure" function. Whatever you want to implement it, you should always return the same device given the same native device provided.

JackAKirk commented 2 years ago

(is there an example of a backend where there may not be a bijection between sycl device and native device?)

My understanding is that intel/llvm with CUDA backend now maps multiple streams to one sycl::device (only solution to implement efficient out-of-order queue). HipSYCL does that same. And maybe most of the application's map will subdevice to the same native backend (I didn't check but that seems to be the most natural way to implement it...)

Exactly, right now the spec assumes a single native_device. So I think we can do the same here. But if one day we change the day to support this M:N mapping, we should not forget to think about the implications of this change on this issue of uniqueness.

So in the cuda backend the native object that contructs a sycl::device is a CUdevice: sycl::device does not need to contain any streams as members etc. So the fact that PI_CUDA uses multiple streams per device is not relevant to make_device.

TApplencourt commented 2 years ago

Good point. I was wrong, Nobody map multiple sycl::device to native::device. They do that at the queue level! Then please just forget everything I said :)

JackAKirk commented 2 years ago

Good point. I was wrong, Nobody map multiple sycl::device to native::device. They do that at the queue level! Then please just forget everything I said :)

I think this is true though:

Exactly, right now the spec assumes a single native_device. So I think we can do the same here.
But if one day we change the day to support this M:N mapping, we should not forget to think about the implications of this change on this issue of uniqueness.

It's possible some backend may do (if not now then in the future) this I guess.

gmlueck commented 2 years ago

FWIW, the Level Zero and CUDA backend specifications clarify that the behavior is option 2:

make_device should return an existing sycl::device if one already exists with the specified native device.

From the Level Zero specification:

Constructs a SYCL device instance from a Level-Zero ze_device_handle_t. The SYCL execution environment for the Level Zero backend contains a fixed number of devices that are enumerated via sycl::device::get_devices() and a fixed number of sub-devices that are enumerated via sycl::device::create_sub_devices(...). Calling this function does not create a new device. Rather it merely creates a sycl::device object that is a copy of one of the devices from those enumerations.

From the proposed CUDA backend (#197):

Construct a SYCL device from a CUDA device. As the SYCL execution environment for the CUDA backend contains a fixed number of devices that are enumerated via sycl::device::get_devices(). Calling this function does not create a new device. Rather it merely creates a sycl::device object that is a copy of one of the devices from that enumeration.

We should probably add similar language to the OpenCL backend spec.

fraggamuffin commented 2 years ago

Will it fail it it exists or a random id that doesn't correspond to anything? But what about remote accelerators that is not enumerated: device ID not reported by SYCL

keryell commented 2 years ago

Another use-case is hot-plug devices, where it is complex to have all the high-level SYCL objects in sync while iterating on them, so it seems difficult to assume that any back-end device is already in the get_devices().

gmlueck commented 2 years ago

@keryell, would you be happy if we add wording to the OpenCL backend interop specification that is similar to the wording we have already for CUDA and Level Zero? For example (describing make_device):

Construct a SYCL device object from an OpenCL device ID. The SYCL execution environment for the OpenCL backend contains a fixed number of root devices that are enumerated via sycl::device::get_devices(). Calling this function with a device ID of an OpenCL root device returns a copy of one of the device objects that is returned from that call.

This leaves the question open as to whether some other backend could support a more dynamic environment of root devices.

KhronosGroup / SYCL-Docs

OpenCL make_device ownership of native device should be uniqued? #266