Should we use CUDA's implicit host-device move and forward?

The programming guide says:

E.3.14.3. Rvalue references

By default, the CUDA compiler will implicitly consider std::move and std::forward function templates to have __host__ __device__ execution space qualifiers, and therefore they can be invoked directly from device code. The nvcc flag --no-host-device-move-forward will disable this behavior; std::move and std::forward will then be considered as __host__ functions and will not be directly invokable from device code.

We currently use our own kat:: versions of those two functions. Should we drop them in favor of std::move() and std::forward(), relying on this behavior of CUDA's?

eyalroz / cuda-kat

Should we use CUDA's implicit host-device move and forward? #47