ROCm / clr

MIT License
85 stars 35 forks source link

Missing warp match functions in HIP #9

Closed Epliz closed 5 months ago

Epliz commented 10 months ago

Hi,

As pointed out at https://github.com/ROCm-Developer-Tools/hipamd/issues/65 , match_any/match_all are not available in HIP. These are available in CUDA (cf. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-match-functions ), and can be implemented on AMD GPUs on Vega+ architectures (such intrinsic corresponds to "WaveMatch" in HLSL shader model 6.5 https://microsoft.github.io/DirectX-Specs/d3d/HLSL_ShaderModel6_5.html#wavematch-function which is supported by Vega+).

Therefore it seems like they can and should be added.

match_any can for example be implemented as seen at https://github.com/llvm/llvm-project/issues/62477 :

static inline __device__ uint64_t  __match_any(int value) {
  bool active = true;
  uint64_t result = 0;

  while (active) {
    // determine what threads have the same value as the currently first active thread
    int first_active_value = __builtin_amdgcn_readfirstlane(value);
    int predicate = (value == first_active_value);
    uint64_t m = __ballot(predicate); // THIS LINE IS PROBLEMATIC

    // if the current thread has the same value, set its result mask to the current one
    if (predicate) {
      result |= m;
      active = false;
    }
  }

  return result;
}

There used to be compiler bugs making it hard to implement them as with the code above, but they have been fixed. Feel free to use that code if you want to.

Best regards, Epliz

Epliz commented 10 months ago

@cjatin , you were kind enough to have a look at https://github.com/ROCm-Developer-Tools/clr/issues/2 ; this one is quite similar, so I would appreciate if you could have a look at this one as well :) .

cjatin commented 5 months ago

https://github.com/ROCm/clr/commit/c5ab5680b400b5b21e9c8c3849ff67e29afcbfc9

Epliz commented 5 months ago

Thanks a lot @cjatin , looking forward to trying it in the next release with it