ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.71k stars 528 forks source link

[HIP][device] 4 __shfl_sync functions are missing #1491

Closed Kaveh01 closed 3 days ago

Kaveh01 commented 5 years ago

CUDA 9 __shfl_sync function is missing. I can use the deprecated __shfl but it would be be better to have the new function. Test code:

   __global__
static void shflTest(int lid){
    int tid = threadIdx.x;
    float value = tid + 0.1f;
    int* ivalue = reinterpret_cast<int*>(&value);

    //use the integer shfl
    int ix = __shfl(ivalue[0],5,32);
    int iy = __shfl_sync(0xFFFFFFFF, ivalue[0],5,32);

    float x = reinterpret_cast<float*>(&ix)[0];
    float y = reinterpret_cast<float*>(&iy)[0];

    if(tid == lid){
        printf("shfl tmp %d %d\n",ix,iy);
        printf("shfl final %f %f\n",x,y);
    }
}

int main()
{
    shflTest<<<1,32>>>(0);
    cudaDeviceSynchronize();
    return 0;
}
emankov commented 5 years ago

__shfl_up_sync, __shfl_down_sync, and __shfl_xor_sync as well.

b-sumner commented 5 years ago

We have some work left in the device compiler to support certain cuda 9 device side features such as the sync APIs. Also note that most AMD devices have a "warp size" of 64, so any code using a 32 bit mask is already broken.

gmarkomanolis commented 3 years ago

Hi, I was trying to hipify a code and there are a few calls to shfl_down_sync. The shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?

acowley commented 3 years ago

@gmarkomanolis What I do when using hipify-perl as part of a build process is include a construction like,

#ifdef __HIP_PLATFORM_HCC__
#define SHFL_DOWN(val, offset) __shfl_down(val, offset)
#else
#define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset)
#endif

The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.

emankov commented 3 years ago

Hi, I was trying to hipify a code and there are a few calls to shfl_down_sync. The shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?

__shfl_down is deprecated since CUDA 9.0, but it is not removed and still can be used even by CUDA 11.2.1.

emankov commented 3 years ago

The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.

What do you mean by old?

acowley commented 3 years ago

I think mentions of hcc are being removed over time.

gmarkomanolis commented 3 years ago

Hi, I was trying to hipify a code and there are a few calls to shfl_down_sync. The shfl_down is deprecated, so it can not be used with CUDA 11. What would be the best approach?

__shfl_down is deprecated since CUDA 9.0, but it is not removed and still can be used even by CUDA 11.2.1.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions

Deprecation Notice: shfl, shfl_up, __shfl_down, and __shfl_xor have been deprecated in CUDA 9.0 for all devices.

Removal Notice: When targeting devices with compute capability 7.x or higher, shfl, shfl_up, __shfl_down, and __shfl_xor are no longer available and their sync variants should be used instead.

I will check though if it is on the code's side as it is not mine. Thanks for the answer.

gmarkomanolis commented 3 years ago

@gmarkomanolis What I do when using hipify-perl as part of a build process is include a construction like,

#ifdef __HIP_PLATFORM_HCC__
#define SHFL_DOWN(val, offset) __shfl_down(val, offset)
#else
#define SHFL_DOWN(val, offset) __shfl_down_sync(0xffffffff, val, offset)
#endif

The specific constant I'm using there (__HIP_PLATFORM_HCC) is old, so a newer one would be better.

Thanks a lot.

jammm commented 3 years ago

Hey, @emankov, any update on __shfl_sync ? It would be great to have this implemented I think.

leachim commented 2 years ago

Any update on this? I am specifically looking for a solution to __shfl_sync

jammm commented 2 years ago

If your code uses a mask of 0xffffffff, then you can just replace your _sync calls with the non-sync ones and it should work fine.

ppanchad-amd commented 5 months ago

@Kaveh01 Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!

lahwaacz commented 5 months ago

@ppanchad-amd You could have just said that the _sync functions were added to the C++ kernel language in some ROCm/HIP version :shrug:

Vishal-S-P commented 3 months ago

I am using rocm 6.1.3 yet I still keep getting this issue. "error: use of undeclared identifier '__shfl_down_sync'"

b-sumner commented 3 months ago

The *_sync functions are not available in 6.1, see, e.g. https://github.com/ROCm/clr/tree/rocm-6.1.x/hipamd/include/hip/amd_detail . The develop branch has an implementation which may appear in a future release.

b-sumner commented 3 months ago

The develop implementation mentioned above has restrictions on its use that match the restrictions stated for pascal in the cuda guide.

lahwaacz commented 3 months ago

The C++ Language Extensions documentation for ROCm 6.1.2 / HIP 6.1.40092 describes this as if the __sync functions were already a thing.

Note that the __sync variants are made available in ROCm 6.2

Note that this is the only reference to ROCm 6.2 in the entire document, the following sections simply list all the _sync variants without any reference to the future ROCm version. Why are future features documented in earlier releases? It seems like somebody just copy-pasted it from NVIDIA :shrug:

schung-amd commented 3 days ago

Apologies for the unclear documentation. These functions are available and disabled by default in 6.2 as stated, usable via a preprocessor macro. If there are issues with their functionality, feel free to comment and we can reopen this thread, or you can submit a new issue.