Open JackAKirk opened 2 years ago
I would prefer solution 2 since SYCL is about simplifying heterogeneous programming.
This adds some latency in the construction but I hope a real application would not use a spam of sycl::make_device
.
By the way, https://github.com/intel/llvm/issues/6055 was created by @Ralender who works on triSYCL. Just to be sure he gets pinged to read this and see there is a follow-up. :-)
I agree with @keryell, I think there is a precedent for this in favour of option 2.
In section 4.6.4 we state:
The execution environment for a SYCL application has a fixed number of devices which does not vary as the application executes. The application can get a list of all these devices via device::get_devices(). The device class also provides constructors, but constructing a new device instance merely creates a new object that is a copy of one of the objects returned by device::get_devices().
This clarification was added along with a similar wording in section 4.6.2 to enforce that SYCL exposes a fix platform/device topology which is consistent and deterministic throughout the execution of the application, such that creating a new platform
or device
object simply creates a copy of one of the existing objects.
I think this same logic should be applied to make_platform
and make_device
to enforce this behaviour, so that these functions would return a copy of an existing platform
or device
for which the created object represents the same native backend object. The behaviour of this would be more like a lookup to find the platform
or device
which matches the native object provided. Then as the the new platform
or device
would be a copy those in the topology just as with any that are created the same common reference semantics rules would apply.
One question with this I see though is what happens if there is no platform
or device
in the topology that corresponds to the native backend object, would this still return an object that is not a copy of one in the topology, would that make sense, or should this through an exception?
make_device should return an existing sycl::device if one already exists with the specified native device.
I think it's not yet a problem as it's not really clear in the spec right now, but implementation may map sycl::devices to a poll of native devices (M:N mapping). In this case what should be the behavior? As he can have multiple sycl::device corresponding to a particular native device. I guess we can always require the implementation to choose an arbitrary one and to always return the same.
In short, we don't have a bijection between sycl::device and native::device and this raises a bunch of questions everywhere in the interopts land
make_device should return an existing sycl::device if one already exists with the specified native device.
I think it's not yet a problem as it's not really clear in the spec right now, but implementation may map sycl::devices to a pull of native devices (M:N mapping). In this case what should be the behavior? As he can have multiple sycl::device corresponding to this particular native device.
In short, we don't have a bijection between sycl::device and native::device and this raises a bunch of questions everywhere.
If there are some backends that have m:n mappings between sycl device and native device (is there an example of a backend where there may not be a bijection between sycl device and native device?) then either:
a) Behaviour of make_device is implementation defined with regard to "uniqueing" devices. (This is the current state of affairs if we assume that if something isn't mentioned in the spec it is implementation defined).
b) specify in the spec that sycl devices will only be uniqued if they only comprise a single native_device (assuming that make_device can only take a single native_device: if the spec is changed so that make_device can take a list of native_devices then require make_device returns an existing sycl device when the list of native_devices provided is a bijection of the list of native_devices contained within an existing sycl device).
c) Some other option I don't see?
(is there an example of a backend where there may not be a bijection between sycl device and native device?)
My understanding is that intel/llvm with CUDA backend now maps multiple streams to one sycl::device (only solution to implement efficient out-of-order queue). HipSYCL does that same. And maybe most of the application's map will subdevice
to the same native backend (I didn't check but that seems to be the most natural way to implement it...)
Exactly, right now the spec assumes a single native_device. So I think we can do the same here. But if one day we change the day to support this M:N mapping, we should not forget to think about the implications of this change on this issue of uniqueness.
May c
can just be make_device
is a "pure" function. Whatever you want to implement it, you should always return the same device given the same native
device provided.
(is there an example of a backend where there may not be a bijection between sycl device and native device?)
My understanding is that intel/llvm with CUDA backend now maps multiple streams to one sycl::device (only solution to implement efficient out-of-order queue). HipSYCL does that same. And maybe most of the application's map will
subdevice
to the same native backend (I didn't check but that seems to be the most natural way to implement it...)Exactly, right now the spec assumes a single native_device. So I think we can do the same here. But if one day we change the day to support this M:N mapping, we should not forget to think about the implications of this change on this issue of uniqueness.
So in the cuda backend the native object that contructs a sycl::device
is a CUdevice
: sycl::device does not need to contain any streams as members etc. So the fact that PI_CUDA uses multiple streams per device is not relevant to make_device.
Good point. I was wrong, Nobody map multiple sycl::device to native::device. They do that at the queue level! Then please just forget everything I said :)
Good point. I was wrong, Nobody map multiple sycl::device to native::device. They do that at the queue level! Then please just forget everything I said :)
I think this is true though:
Exactly, right now the spec assumes a single native_device. So I think we can do the same here.
But if one day we change the day to support this M:N mapping, we should not forget to think about the implications of this change on this issue of uniqueness.
It's possible some backend may do (if not now then in the future) this I guess.
FWIW, the Level Zero and CUDA backend specifications clarify that the behavior is option 2:
From the Level Zero specification:
Constructs a SYCL device instance from a Level-Zero
ze_device_handle_t
. The SYCL execution environment for the Level Zero backend contains a fixed number of devices that are enumerated viasycl::device::get_devices()
and a fixed number of sub-devices that are enumerated viasycl::device::create_sub_devices(...)
. Calling this function does not create a new device. Rather it merely creates asycl::device
object that is a copy of one of the devices from those enumerations.
From the proposed CUDA backend (#197):
Construct a SYCL
device
from a CUDA device. As the SYCL execution environment for the CUDA backend contains a fixed number of devices that are enumerated viasycl::device::get_devices()
. Calling this function does not create a new device. Rather it merely creates asycl::device
object that is a copy of one of the devices from that enumeration.
We should probably add similar language to the OpenCL backend spec.
Will it fail it it exists or a random id that doesn't correspond to anything? But what about remote accelerators that is not enumerated: device ID not reported by SYCL
Another use-case is hot-plug devices, where it is complex to have all the high-level SYCL objects in sync while iterating on them, so it seems difficult to assume that any back-end device is already in the get_devices()
.
@keryell, would you be happy if we add wording to the OpenCL backend interop specification that is similar to the wording we have already for CUDA and Level Zero? For example (describing make_device
):
Construct a SYCL
device
object from an OpenCL device ID. The SYCL execution environment for the OpenCL backend contains a fixed number of root devices that are enumerated viasycl::device::get_devices()
. Calling this function with a device ID of an OpenCL root device returns a copy of one of thedevice
objects that is returned from that call.
This leaves the question open as to whether some other backend could support a more dynamic environment of root devices.
https://github.com/intel/llvm/issues/6055 raised the issue that some implementations may allow
make_device
to return a duplicate sycl::device if there is already asycl::device
owning the specified native device provided as an argument tomake_device
. One implication of this is that there can be multiplesycl::device
s that independently reference count a single native device and therefore there can be multiple independentsycl::device
responsible for destruction of a single native device. A draft "fix" for this is here : https://github.com/intel/llvm/pull/6204.Assuming that sycl::device is stateless in every backend the request made in https://github.com/intel/llvm/issues/6055 that
sycl::device
s be "uniqued" seems reasonable to me. However I don't see anything covering this in either "4.5.1. Backend interoperability" section of the SYCL 2020 rev 5 spec, or in Appendix "C.7.1. Construct SYCL objects from OpenCL ones".I would like to clarify which of the following is true (unless there is another option)?
make_device
is implementation defined with regard to "uniqueing" devices. (This is the current state of affairs if we assume that if something isn't mentioned in the spec it is implementation defined).make_device
should return an existing sycl::device if one already exists with the specified native device.make_device
should return a new (non "uniqued") sycl::device even if one already exists with the specified native device.Thanks