NVIDIA / libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
https://nvidia.github.io/libcudacxx
Other
2.29k stars 187 forks source link

libcu++ should not define __host__/__device__/__forceinline__ (or any other runtime macros) #460

Closed jrhemstad closed 1 year ago

jrhemstad commented 1 year ago

In our __config we have the following:

https://github.com/NVIDIA/libcudacxx/blob/b8b37d6784188cf4e64eb2f04d89797e8f77bea1/include/cuda/std/detail/__config#L14-L39

this will define things like __host__ to be empty when we detect that we're not compiling with nvcc in order to silence any errors about unrecognized symbols that would occur from host compilers that don't understand what __host__ is.

That is a reasonable thing in many cases, but there are legitimate use cases where this can break. For example, the following file compiled with a host compiler will fail to compile:

#include <cuda/std/type_traits>
#include <cuda_runtime_api.h>

This will fail due to a redefinition error of __host__ and friends because cuda/std/type_traits will detect we're not compiling with nvcc and define __host__ to empty, but then cuda_runtime_api.h will also define __host__, causing the redefinition error.

One could reasonably argue that the runtime headers should better guard against this situation by undefining __host__ before defining it. However, it is most expedient for us to fix this in libcu++.

The only robust solution I can think of is to avoid ever defining __host__/__device__/__forceinline__ ourselves and instead use our own macros like:

#if defined(__CUDACC__)
  #define _LIBCUDACXX_HOST __host__
#else 
  #define _LIBCUDACXX_HOST
#endif

And use those in place of any where we'd use __host__ (or other macros).

I'd expect we should rarely be using host/device annotations directly anyways. Those should come from __LIBCUDACXX_INLINE_VISIBILITY or just _LIBCUDACXX_EXECUTION_SPACE_SPECIFIER.

jrhemstad commented 1 year ago

It looks like the number of places we use __host__ or __device__ directly are pretty small, and they are almost always together.

miscco commented 1 year ago

We got another two related bug reports, so I have opened #476