Open linehill opened 1 year ago
Does this work for CUDA? Officially not supported. I see a solution here:
https://stackoverflow.com/questions/18950732/atomic-max-for-floats-in-opencl
Seems to be the same as atomicAdd(), the long version is not in HIP officially, but it's used.
@pjaaskel Does the implementation in the link I sent good enough for us?
Does this work for CUDA? Officially not supported. I see a solution here: https://stackoverflow.com/questions/18950732/atomic-max-for-floats-in-opencl
LGTM but it could be simplified applying atomic_compare_exchange_strong directly on the floating-point values without need for bitcasting.
A case discovered from HeCBench/lebesgue-hip. Reduced case:
This compiles with ROCm's hipcc but chipStar's hipcc fails: