Closed Kaveh01 closed 3 days ago
__shfl_up_sync
, __shfl_down_sync
, and __shfl_xor_sync
as well.
We have some work left in the device compiler to support certain cuda 9 device side features such as the sync APIs. Also note that most AMD devices have a "warp size" of 64, so any code using a 32 bit mask is already broken.
Hi, I was trying to hipify a code and there are a few calls to shfl_down_sync. The shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?
@gmarkomanolis What I do when using hipify-perl
as part of a build process is include a construction like,
#ifdef __HIP_PLATFORM_HCC__
#define SHFL_DOWN(val, offset) __shfl_down(val, offset)
#else
#define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset)
#endif
The specific constant I'm using there (__HIP_PLATFORM_HCC
) is old, so a newer one would be better.
Hi, I was trying to hipify a code and there are a few calls to shfl_down_sync. The shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?
__shfl_down
is deprecated since CUDA 9.0
, but it is not removed and still can be used even by CUDA 11.2.1
.
The specific constant I'm using there (
__HIP_PLATFORM_HCC
) is old, so a newer one would be better.
What do you mean by old
?
I think mentions of hcc
are being removed over time.
Hi, I was trying to hipify a code and there are a few calls to shfl_down_sync. The shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?
__shfl_down
is deprecated sinceCUDA 9.0
, but it is not removed and still can be used even byCUDA 11.2.1
.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions
Deprecation Notice: shfl, shfl_up, __shfl_down, and __shfl_xor have been deprecated in CUDA 9.0 for all devices.
Removal Notice: When targeting devices with compute capability 7.x or higher, shfl, shfl_up, __shfl_down, and __shfl_xor are no longer available and their sync variants should be used instead.
I will check though if it is on the code's side as it is not mine. Thanks for the answer.
@gmarkomanolis What I do when using
hipify-perl
as part of a build process is include a construction like,#ifdef __HIP_PLATFORM_HCC__ #define SHFL_DOWN(val, offset) __shfl_down(val, offset) #else #define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset) #endif
The specific constant I'm using there (
__HIP_PLATFORM_HCC
) is old, so a newer one would be better.
Thanks a lot.
Hey, @emankov, any update on __shfl_sync
? It would be great to have this implemented I think.
Any update on this? I am specifically looking for a solution to __shfl_sync
If your code uses a mask of 0xffffffff, then you can just replace your _sync calls with the non-sync ones and it should work fine.
@Kaveh01 Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!
@ppanchad-amd You could have just said that the _sync
functions were added to the C++ kernel language in some ROCm/HIP version :shrug:
I am using rocm 6.1.3 yet I still keep getting this issue. "error: use of undeclared identifier '__shfl_down_sync'"
The *_sync functions are not available in 6.1, see, e.g. https://github.com/ROCm/clr/tree/rocm-6.1.x/hipamd/include/hip/amd_detail . The develop branch has an implementation which may appear in a future release.
The develop implementation mentioned above has restrictions on its use that match the restrictions stated for pascal in the cuda guide.
The C++ Language Extensions documentation for ROCm 6.1.2 / HIP 6.1.40092 describes this as if the __sync
functions were already a thing.
Note that the
__sync
variants are made available in ROCm 6.2
Note that this is the only reference to ROCm 6.2 in the entire document, the following sections simply list all the _sync
variants without any reference to the future ROCm version. Why are future features documented in earlier releases? It seems like somebody just copy-pasted it from NVIDIA :shrug:
Apologies for the unclear documentation. These functions are available and disabled by default in 6.2 as stated, usable via a preprocessor macro. If there are issues with their functionality, feel free to comment and we can reopen this thread, or you can submit a new issue.
CUDA 9
__shfl_sync
function is missing. I can use the deprecated__shfl
but it would be be better to have the new function. Test code: