Closed Artem-B closed 2 days ago
Similar issue is also present in fence.h: https://github.com/NVIDIA/cccl/blob/6ee3415a8d0eea82d0d6f9915aa249a6ceb13e24/libcudacxx/include/cuda/__ptx/instructions/fence.h#L259
In file included from third_party/gpus/cccl/libcudacxx/test/libcudacxx/cuda/ptx/ptx.tensormap.cp_fenceproxy.compile.pass.cpp:14:
In file included from blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/ptx:87:
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:61:11: error: constraint 'n' expects an integer constant expression
61 | "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.cta.sync.aligned [%0], [%1], %2;"
| ^
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:67:11: error: constraint 'n' expects an integer constant expression
67 | "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.cluster.sync.aligned [%0], [%1], %2;"
| ^
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:73:11: error: constraint 'n' expects an integer constant expression
73 | "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [%0], [%1], %2;"
| ^
blaze-out/k8-opt/bin/third_party/gpus/cuda/_virtual_includes/_stage/third_party/gpus/cuda/include/cuda/__ptx/instructions/tensormap_cp_fenceproxy.h:79:11: error: constraint 'n' expects an integer constant expression
79 | "tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.sys.sync.aligned [%0], [%1], %2;"
| ^
Thanks a lot for bringing that up. I have added an explicit call to the conversion operator so that we indeed pass the value around
Clang complains about inline asm constraint violation:
https://github.com/NVIDIA/cccl/blob/6ee3415a8d0eea82d0d6f9915aa249a6ceb13e24/libcudacxx/include/cuda/__ptx/instructions/cp_async_bulk_wait_group.h#L50
We're passing a function argument to an inline asm constraint that requires a constant.
Did we intend to use template argument
_N32
instead?