There's no GPU-side std::terminate()

NVIDIA / cccl

CUDA Core Compute Libraries

https://nvidia.github.io/cccl/

Other

1.28k stars 163 forks source link

There's no GPU-side std::terminate() #1600

Open Artem-B opened 7 months ago

Artem-B commented 7 months ago

https://github.com/NVIDIA/cccl/blob/4e6f24a6af6b661ed561f1e598df109f03973cdf/libcudacxx/test/support/test_allocator.h#L129

https://godbolt.org/z/E8Ef7x7GM

Considering that the test is used from both the host and the device side, probably the easiest fix would be to replace it with an always-false assert, which is available on both the host and the GPU.

miscco commented 7 months ago

Actually we have cuda::std::terminate() that we can use internally. https://github.com/NVIDIA/cccl/blob/main/libcudacxx/include/cuda/std/detail/libcxx/include/__exception/terminate.h

It calls __trap() on device as the next best thing

miscco commented 7 months ago

Funnily enough I have https://github.com/NVIDIA/cccl/pull/1583 open, and that does not use test_allocator.h

Artem-B commented 7 months ago

Actually we have cuda::std::terminate() that we can use internally.

What do I need to include to make it available? #include <cuda/std/__exception> is not sufficient, as it only pulls in the host side std::terminate.

miscco commented 7 months ago

That should actually work 🤔

cuda/std/__exception> pulls in <cuda/std/detail/libcxx/include/exception> which pulls in <cuda/std/detail/libcxx/include/__exception/terminate.h>

There we have a definition of cuda::std::terminate() which is essentially a forward to __cccl_terminate()

Artem-B commented 7 months ago

That should actually work

Yet it does not -- neither with clang nor with nvcc as the compiler explorer reproducer demonstrates. https://godbolt.org/z/oen737Pqh

<source>(8): error: calling a __host__ function("std::terminate()") from a __global__ function("square") is not allowed
<source>(8): error: identifier "std::terminate" is undefined in device code

miscco commented 7 months ago

But you are calling std::terminate

It works fine with cuda::std::terminate https://godbolt.org/z/cP531sj93

miscco commented 7 months ago

Oh now I understand it, that code is inside some test header that was brought in from libcxx in our initial fork.

AFAIK it is never actually used anywhere. I have been purging all unused files in product code, but did not do so in the test code. Need to sweep through it soon

leofang commented 2 months ago

Preferably trap is not used in device code: https://github.com/NVIDIA/cccl/issues/939#issuecomment-1802542072