Closed PDoakORNL closed 5 years ago
NVCC is just more relaxed about consistent declarations. Clang requires you to have both the declaration and the implementation be marked identically. This is coming from the fact that clang is using the host device marking as part of function overload resolution. With clang this is valid:
struct foo {
__host__ static void hello() {printf("Hello from Host\n");}
__device__ static void hello() {printf("Hello from Device\n");}
};
With NVCC you can't do that and you need to do the following:
struct foo {
__host__ __device__ static void hello () {
#ifndef __CUDA_ARCH__
printf("Hello from Host\n");
#else
printf("Hello from Device\n");
#endif
}
};
Furthermore there is some funkyness about visibility of functions. With Clang you have to have a consistent inventory of functions for host and device compilation. With NVCC you can define device functions inside an #ifdef __CUDA_ARCH__
and a host function inside the #else
branch. The details are a bit more complicated than that but this is the gist.
What is the point of the __host__ __device__
for nvcc? It's required but ignored or not used at all?
It marks the function as " provide a CPU version and a GPU version"
Well for clang it does. It seems like for nvcc only device and host device is meaningful on its own except possibly in a .cu. Anyway this problem is resolved although it seems unlikely to me that all the functions marked KOKKOS_INLINE_FUNCTION need to be compiled for the device.
I am not using the nvcc wrapper I am just using clang to build. I get a couple of errors like this
Clang actually cares about host and device (nvcc I think does "magic" here)
changing the inline to KOKKOS_INLINE_FUNCTION clears that error and takes you to more issues relating to haphazard treatment of host and device
I could not resolve this one.