NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.22k stars 156 forks source link

Inline asm constraint violation in cp_async_bulk_wait_group.h and fence.h #2225

Closed Artem-B closed 2 days ago

Artem-B commented 2 months ago

Clang complains about inline asm constraint violation:

In file included from test/libcudacxx/cuda/ptx/ptx.cp.async.bulk.wait_group.compile.pass.cpp:14:
In file included from include/cuda/ptx:76:
include/cuda/__ptx/instructions/cp_async_bulk_wait_group.h:48:19: error: constraint 'n' expects an integer constant expression
   48 |     (asm volatile("cp.async.bulk.wait_group %0;"
      |                   ^
include/cuda/__ptx/instructions/cp_async_bulk_wait_group.h:71:19: error: constraint 'n' expects an integer constant expression
   71 |     (asm volatile("cp.async.bulk.wait_group.read %0;"
      |                   ^
2 errors generated when compiling for sm_90.

https://github.com/NVIDIA/cccl/blob/6ee3415a8d0eea82d0d6f9915aa249a6ceb13e24/libcudacxx/include/cuda/__ptx/instructions/cp_async_bulk_wait_group.h#L50

We're passing a function argument to an inline asm constraint that requires a constant.

Did we intend to use template argument _N32 instead?

Artem-B commented 2 months ago

Similar issue is also present in fence.h: https://github.com/NVIDIA/cccl/blob/6ee3415a8d0eea82d0d6f9915aa249a6ceb13e24/libcudacxx/include/cuda/__ptx/instructions/fence.h#L259

In file included from third_party/gpus/cccl/libcudacxx/test/libcudacxx/cuda/ptx/ptx.tensormap.cp_fenceproxy.compile.pass.cpp:14:
In file included from blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/ptx:87:
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:61:11: error: constraint 'n' expects an integer constant expression
   61 |           "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.cta.sync.aligned  [%0], [%1], %2;"
      |           ^
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:67:11: error: constraint 'n' expects an integer constant expression
   67 |           "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.cluster.sync.aligned  [%0], [%1], %2;"
      |           ^
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:73:11: error: constraint 'n' expects an integer constant expression
   73 |           "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned  [%0], [%1], %2;"
      |           ^
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:79:11: error: constraint 'n' expects an integer constant expression
   79 |           "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.sys.sync.aligned  [%0], [%1], %2;"
      |           ^
miscco commented 2 months ago

Thanks a lot for bringing that up. I have added an explicit call to the conversion operator so that we indeed pass the value around