QMCPACK / miniqmc

QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experiments
Other
26 stars 35 forks source link

Kokkos: Clang7/CUDA 9.2 has issues with function declarations and defnitions involving KOKKOS_INLINE_FUNCTION #199

Closed PDoakORNL closed 5 years ago

PDoakORNL commented 5 years ago

I am not using the nvcc wrapper I am just using clang to build. I get a couple of errors like this

In file included from ../src/Drivers/check_wfc.cpp:39:
../src/QMCWaveFunctions/Jastrow/TwoBodyJastrow.h:309:49: error: __host__ __device__ function 'operator()' cannot overload __host__ function 'operator()'
KOKKOS_INLINE_FUNCTION void TwoBodyJastrow<FT>::operator() (const typename policy_t::member_type& team) const {
                                                ^
../src/QMCWaveFunctions/Jastrow/TwoBodyJastrow.h:158:15: note: previous declaration is here
  inline void operator()(const typename policy_t::member_type& team) const;

Clang actually cares about host and device (nvcc I think does "magic" here)

inline void operator()(const typename policy_t::member_type& team) const

template<typename FT>
KOKKOS_INLINE_FUNCTION void TwoBodyJastrow<FT>::operator() (const typename policy_t::member_type& team) const {
  int jg = jg_hack;
  int iStart = first[jg];
  int iEnd = last[jg];
 // printf("Hi %d %d %d\n",jg,iStart,iEnd);
  F[igt+jg].evaluateVGL(team,iat,iStart, iEnd, dist, u, du, d2u, DistCompressed.data(),
                        DistIndice.data());
}

changing the inline to KOKKOS_INLINE_FUNCTION clears that error and takes you to more issues relating to haphazard treatment of host and device

../src/Particle/Lattice/CrystalLattice.h:196:15: error: reference to __host__ function '~TinyVector' in __host__ __device__ function
    val_dot = toUnit(r);
              ^
../src/Numerics/OhmmsPETE/TinyVector.h:154:3: note: '~TinyVector' declared here
  ~TinyVector() {}
  ^
../src/QMCWaveFunctions/einspline_spo.hpp:347:25: note: called by 'operator()'
    auto u    = Lattice.toUnit_floor(tmp_pos);
                        ^
/scratch/epd/KOKKOS/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:651:7: note: called by 'exec_team<qmcplusplus::einspline_spo<double>::EvaluateVGHTag>'
    { m_functor( TagType() , member ); }
      ^
/scratch/epd/KOKKOS/kokkos/core/src/Cuda/Kokkos_Cuda_Parallel.hpp:684:23: note: called by 'operator()'
      this-> template exec_team< WorkTag >(
                      ^
/scratch/epd/KOKKOS/kokkos/core/src/Cuda/Kokkos_CudaExec.hpp:162:3: note: called by 'cuda_parallel_launch_constant_memory<Kokkos::Impl::ParallelFor<qmcplusplus::einspline_spo<double>, Kokkos::TeamPolicy<qmcplusplus::einspline_spo<double>::EvaluateVGHTag>, Kokkos::Cuda> >'
  driver();

I could not resolve this one.

crtrott commented 5 years ago

NVCC is just more relaxed about consistent declarations. Clang requires you to have both the declaration and the implementation be marked identically. This is coming from the fact that clang is using the host device marking as part of function overload resolution. With clang this is valid:

struct foo {
   __host__ static void hello() {printf("Hello from Host\n");}
   __device__ static void hello() {printf("Hello from Device\n");}
};

With NVCC you can't do that and you need to do the following:

struct foo {
  __host__ __device__ static void hello () {
    #ifndef __CUDA_ARCH__
    printf("Hello from Host\n");
    #else
    printf("Hello from Device\n");
    #endif
  }
};

Furthermore there is some funkyness about visibility of functions. With Clang you have to have a consistent inventory of functions for host and device compilation. With NVCC you can define device functions inside an #ifdef __CUDA_ARCH__ and a host function inside the #else branch. The details are a bit more complicated than that but this is the gist.

PDoakORNL commented 5 years ago

What is the point of the __host__ __device__ for nvcc? It's required but ignored or not used at all?

crtrott commented 5 years ago

It marks the function as " provide a CPU version and a GPU version"

PDoakORNL commented 5 years ago

Well for clang it does. It seems like for nvcc only device and host device is meaningful on its own except possibly in a .cu. Anyway this problem is resolved although it seems unlikely to me that all the functions marked KOKKOS_INLINE_FUNCTION need to be compiled for the device.