SYCL buffer/USM pointer interoperability

colleeneb commented 1 year ago

Hello,

Are there any plans for SYCL buffer-USM pointer interoperability? Apologies if this was discussed before, I searched the issues and found this: https://github.com/KhronosGroup/SYCL-Docs/issues/230#issuecomment-1059199180 but I'm not sure if it's the same.

As an example, it looks like OpenSYCL has some implementation for it: https://github.com/OpenSYCL/OpenSYCL/blob/develop/doc/buffer-usm-interop.md

The motivation for this question is: What if there is a SYCL library that was written just with USM API support (i.e. for functions it implements, the library API takes in a queue and USM pointers as arguments), is there a way for a code written with SYCL buffers and accessors to still use the library?

Thank you!

gmlueck commented 1 year ago

I think this could not be part of the core SYCL specification unless we also make USM a required feature. (Currently, a device is only guaranteed to support "device" USM if it has aspect::usm_device_allocations.) Even then, adding an API like this to the core spec would require implementors to implement buffer on top of USM, which is not currently the case.

An API like this might make more sense as a Level Zero backend interop API, which does already provide interoperation between buffers and USM pointers. It probably does not make sense to have an API like this in the OpenCL interop API because OpenCL backends are likely to implement buffers on top of cl_mem rather than using USM.

TApplencourt commented 1 year ago

Why do you think that a buffer interop API will require the buffer to use USM? We can call the API get_native and not get_pointer :)

More seriously, The major problem we want to solve is that some Library developers want to propose only raw pointer (USM) + queue API. If they do that, how can the Application use such libraries if their code uses buffers?

Also I think such interop capabilities will help the adoption of buffers API. But indeed, we can restrict the API only available if aspect::usm_device_allocations is present.

jinz2014 commented 1 year ago

This may be beyond the topic. Would the SYCL specification delete SYCL buffers and support just one style (raw pointer) one day ? Reduce complexities.

TApplencourt commented 1 year ago

I'm against it. Buffers are a really nice abstraction! Maybe just a little to be verbose now but let's not be side tracked :)

jinz2014 commented 1 year ago

ok

kevin-harms commented 1 year ago

Why does have a requiring interop with USM require implementation of USM? I think it can safely be stated that if USM is supported, then interop with buffers must be supported. I'm not sure the best way to do that, but I agree that if it is supported, it provides an onramp for codes to adopt buffers and leveraging the dependencies for kernel execution.

I think this could not be part of the core SYCL specification unless we also make USM a required feature. (Currently, a device is only guaranteed to support "device" USM if it has aspect::usm_device_allocations.) Even then, adding an API like this to the core spec would require implementors to implement buffer on top of USM, which is not currently the case.

An API like this might make more sense as a Level Zero backend interop API, which does already provide interoperation between buffers and USM pointers. It probably does not make sense to have an API like this in the OpenCL interop API because OpenCL backends are likely to implement buffers on top of cl_mem rather than using USM.

kevin-harms commented 1 year ago

I won't offer an opinion, but I think within the next two years (or maybe now) the mass of code using USM will be >> than buffers, so if we don't have a method of interop, we will keep seeing this type of request to drop support for buffers.

This may be beyond the topic. Would the SYCL specification delete SYCL buffers and support just one style (raw pointer) one day ? Reduce complexities.

TApplencourt commented 1 year ago

So let's say I have a code that uses a buffer, and I want to interop with a function that use USM. For now, I need to do

sycl::event foo_usr(T* ptr, sycl::queue Q) {}

foo_buffer(sycl::buffer<T> B, sycl::queue Q) { 
     // Allocate Memory
     auto *B_tmp = Q.malloc_device<T>(B.size(), Q);
     // Copy to GPU
     Q.submit([&](sycl::handler &cgh) {
        sycl::accessor accessorB{B, cgh, sycl::read_only};
        cgh.parallel_for(B.size(),  [=](auto idx) {
            B_tmp[idx] = accessorB[idx];
      }).wait();
      // Call Function in a blocking manner
      foo_usr(B_tmp, Q).wait();
      // Copy back
     Q.submit([&](sycl::handler &cgh) {
        sycl::accessor accessorB{B, cgh, sycl::write_only};
        cgh.parallel_for(B.size(),  [=](auto idx) {
            accessorB[idx] = B_tmp[idx]
      }).wait();
      sycl::free(B_tmp , Q);
}

I think we can all agree that it's a little tedious and not super efficient.

One option will be creating a new type of accessor that gives access to the underlying pointer and who can be user to set the lifetime. So the code can be roughly like this:

sycl::event foo_usr(T* ptr, sycl::queue Q) {}

foo_buffer(sycl::buffer<T> B, sycl::queue Q) { 
    usm_accessors B_native{B, Q}; // Similar to a host-accessors
    e =  foo_usr(B_native.get_pointer(), Q) // Similar to host-accessors too
    B_native.set_synchronization_event(e);  // Just so the Buffer can continue tracking the dependency DAG, or we can always synchronize when `usm_accessors` go out of scope.
}

What do people think? It doesn't address the creation of a buffer from a USM pointer, but this is a start. Pinging @illuhad as OpenSYCL have this nice extension but requires all buffer to be backed by USM.

gmlueck commented 1 year ago

How would you expect this API to work for a backend that doesn't use USM to implement buffer? For example, the OpenCL backend is required to provide interoperation with cl_mem, so I would expect that all OpenCL backend implementations would use cl_mem to implement buffer instead of USM. What would you expect to happen if the user constructed a usm_accessor for such a buffer?

TApplencourt commented 1 year ago

Indeed usm_accessor should be backed by USM allocation. I expect an implementation that use cl_men to implement tmp USM Buffer workaround (the method users need to do currently).

This tmp-buffer will only be needed when user will use usm_accessor, so no implementation are required to use USM for their buffer.

illuhad commented 1 year ago

Pinging @illuhad as OpenSYCL have this nice extension but requires all buffer to be backed by USM.

Small correction: It only requires buffers to be backed by USM pointers on those devices which support USM (currently, all our backends support USM unconditionally, so this is not much of an issue. If we had a backend or device that cannot support this, the interop API would need to throw exceptions). I don't think this is an unreasonable requirement, since clearly if you want to do USM-buffer interop, you need to have a device that supports USM.

I strongly believe that implementing buffers on top of USM in this case allows cleaner APIs and better, more understandable guarantees for users, and I see little reason not to do it if we want to have an interop API.

gmlueck commented 1 year ago

I expect an implementation that use cl_men to implement tmp USM Buffer workaround (the method users need to do currently).

We need to think through the case when the buffer has separate cl_mem and USM pointers to the same data, and make sure it is clear which copy is current. Is the idea that set_synchronization_event tells the usm_accessor when the contents of the USM buffer are no longer needed, and thus it is safe for the implementation to copy the data back to the cl_mem?

TApplencourt commented 1 year ago

. Is the idea that set_synchronization_event tells the usm_accessor when the contents of the USM buffer are no longer needed, and thus it is safe for the implementation to copy the data back to the cl_mem?

Exactly, and the usm_accessors can take a lock to the cl_mem allocation to avoid any race condition (and simplify the which have the correct value question)

illuhad commented 1 year ago

Some more thoughts:

I don't quite follow what should happen on OpenCL. If the OpenCL device supports USM, we could also have the buffer live on top of USM pointers for that device, and thus provide better interop. If USM is not supported, this API won't help either. IMO the OpenCL backend specification should be changed to not require buffer interop with cl_mem unconditionally, and allow implementations to opt out.
Allowing get_pointer() outside of device code is confusing because for device accessors, this is absolutely, absolutely forbidden, and I don't think we should start allowing this in some cases.
It's unclear to me what this usm_accessor needs to support. Accessors are in general very complicated classes. Would we need to duplicate the accessor API here? Or is this class really just about the get_pointer(), and there is no other relation to accessors?

TApplencourt commented 1 year ago

I don't quite follow what should happen on OpenCL. If the OpenCL device supports USM, we could also have the buffer live on top of USM pointers for that device, and thus provide better interop. If USM is not supported, this API won't help either. IMO the OpenCL backend specification should be changed to not require buffer interop with cl_mem unconditionally, and allow implementations to opt out.

I have no opinion about it. For me, this is just an implementation detail to make the interopt more efficient, not a requirement.

Allowing get_pointer() outside of device code is confusing because for device accessors, this is absolutely, absolutely forbidden, and I don't think we should start allowing this in some cases.

Make, sense. We can rename it to get_usm_pointer<T>() where T is ::shared, ::device, ::host.

It's unclear to me what this usm_accessor needs to support. Accessors are in general very complicated classes. Would we need to duplicate the accessor API here? Or is this class really just about the get_pointer(), and there is no other relation to accessors?

The second case. It's mostly about get_pointer, and the lifetime of the use allocation returned. So we can keep the API of buffer simple and put all the interrupt required functions in the usm_accesor class. The idea of using an accessor seemed natural to me. Like with an accessor, we want to modify the buffer and to the dependency management. But maybe this is indeed confusing to people.

jinz2014 commented 1 year ago

Is it helpful to invite users and developers to vote the long-term support for SYCL buffers ? Users/developers can focus much less on how to migrate CUDA/HIP programs to buffers, and much more on how to improve performance (portability) of SYCL programs across devices.

tomdeakin commented 1 year ago

@TApplencourt please prepare for an in-depth discussion!

KhronosGroup / SYCL-Docs

SYCL buffer/USM pointer interoperability #391