Open j-stephan opened 5 years ago
Interestingly, Intel is trying hard to hide what you are asking for: https://github.com/intel/llvm/pull/348
Can you imagine an API that could be added to the standard?
An easy solution that doesn't require an API change would be to correctly deduce the overloads, i.e. foo(global_ptr)
, foo(local_ptr)
and so on. This is not very intuitive, though, and might break user APIs.
From the programmer's point of view it would be preferable to allow multi_ptr
construction on raw pointers without having to specify the address space. The compiler should be able to figure this out by itself since it knows about the address spaces anyway.
On the other hand it should raise an error if the programmer tries to assign a raw pointer in local space to a global_ptr
. Currently this doesn't happen, both the Intel and ComputeCpp compiler will happily compile if I pass the same pointer to global_ptr
's and local_ptr
's constructor.
Admittedly I haven't given this much thought yet (I only encountered the problem on Wednesday), I'll try to think this through on the weekend.
The weekend has passed... Apart from the solutions above the best I could come up with is something like cl::sycl::pointer_traits
to be added to the specification. The interface would look something along the lines of
template <typename Ptr>
struct pointer_traits
{
static_assert(is_raw_ptr_type(Ptr), "Ptr needs to be a raw pointer type");
using pointer_t = /* implementation-defined */ Ptr;
using address_space = /* implementation-defined */;
// maybe add other traits here
};
Since the compiler needs to figure out the address space on its own anyway (if I understand Section 6.8 correctly), it would fill out the implementation-defined
parts. A programmer could then use SFINAE or if constexpr
to adapt to the different address spaces.
This is basically the problem multi_ptr
tries to solve, it already encapsulates the functionality above. However, multi_ptr
requires the user to specify the address space before using it. This makes sense because we can request a multi_ptr
from a buffer accessor, a local accessor, and so on and the multi_ptr
data structure has to know about its address space. It also renders us unable to construct it from a pointer we don't know the address space of.
So my straight-forward resolution still is to remove the requirement to specify the address space for the multi_ptr
type. Instead the compiler needs to figure out the correct value for the address_space
member of multi_ptr
(or the Space
template parameter). If this is not an option because of implications I'm not aware of (and I'm sure there are plenty) I'd shoot for the pointer_traits
option.
Since the compiler needs to figure out the address space on its own anyway (if I understand Section 6.8 correctly), it would fill out the implementation-defined parts. A programmer could then use SFINAE or
if constexpr
to adapt to the different address spaces.
The problem is that this address space resolution can be done in LLVM or even in the SPIR-V backend or whatever... So you might not have this information inside Clang as a type trait... :-(
multi_ptr
was designed:
multi_ptr
type. But with auto
nowadays it is easier;multi_ptr
to call an OpenCL function with different version and different names for each possible address-space...The problem is that this address space resolution can be done in LLVM or even in the SPIR-V backend or whatever... So you might not have this information inside Clang as a type trait... :-(
I have to admit that my knowledge about compiler construction is a bit limited. But the backends will have to look up this information, too - why can't the frontend do the same?
To avoid requiring this kind of address-space inference by avoiding using raw pointers. Of course this means passing around the multi_ptr type. But with auto nowadays it is easier;
While I can understand this intent with regard to new code I believe this is an oversight if we consider legacy code bases. If those have a raw pointer API the design of multi_ptr
or the lack of a feature to otherwise extract the address space becomes a major obstacle.
SYCL WG: there is an ongoing discussion internally. We will get back as soon as we have an update.
related? https://gitlab.khronos.org/sycl/Specification/-/issues/607 need to decide if this is the right direction for SYCL
Imagine a device-side function with the following signature:
I don't know if
vec
comes from global, local, constant or private memory. However, insidefoo
I'd like to do something tovec
which requires me to know the address space of the pointer, e.g. acl::sycl::atomic_fetch_add
. How do I tell themulti_ptr
/atomic
insidefoo
which address space is needed? Simply using aglobal_ptr
will break ifvec
actually resides in local memory. Usingmulti_ptr
will fail because the address space template parameter is missing. Creating anatomic
by passingvec
to its constructor will fail becausevec
isn't amulti_ptr
. Usingatomic_fetch_add
onvec
will fail becausevec
isn't anatomic
type.Some implementations (like ComputeCpp) internally use
__global
to annotate the pointer during device compilation. But even if there was a way to write something likevoid foo(__global int* vec)
(there isn't as far as I know, ComputeCpp complains if I do this) this would be a bad idea because the address space attributes are implementation-defined.Why do we need this? Sadly, there are libraries / frameworks out there that pass around raw pointers but where a SYCL backend is planned / worked on.
Edit: I also tried to overload
foo
withglobal_ptr
,local_ptr
etc. directly. This will fail because the call is ambigous.