Add a mechanism for making device function pointers usable on the host side

It is not possible to naively use a __device__-function's pointer in host-side code. However, it is possible to use it if you copy its address from a global device-side variable which holds the address, like so:

__device__ void foo(int x) { /*... */ }
__device__ void (*ptr_on_device) (int x) = foo;

void bar() {
  void  (*ptr_on_host)(int x);
  cuda::memory::region_t pod_region = cuda::memory::locate(ptr_on_device);
  cuda::memory::copy(&ptr_on_host, pod_region.data(), pod_region.size());
  // or using raw CUDA API calls:
  // cudaMemcpyFromSymbol(&ptr_on_host, ptr_on_device, sizeof(void (*)(int)));
  // ... and check the error

  // Now do stuff with ptr_on_host
}

See also this question on StackOverflow.

Perhaps we could add a mechanism abstracting the above to the library. While it would be host-side code mostly, passing device-side function pointers to kernels is definitely a useful "tool" for kernel authors to have.

eyalroz / cuda-kat

Add a mechanism for making device function pointers usable on the host side #94