ROCm / ROCm-Device-Libs

ROCm Device Libraries
97 stars 60 forks source link

Forward progress guarantees of ockl_hsa_signal_cas? #69

Closed jpsamaroo closed 3 years ago

jpsamaroo commented 3 years ago

I've implemented a simple form of hostcall in the Julia AMDGPU compute library AMDGPU.jl. In my efforts to make it safe for concurrent wavefront access, I use ockl_hsa_signal_cas to transition a signal between various states in an atomic manner. I'm running into an issue where my CAS loop which does the transition doesn't make forward progress, even when only a single wavefront is trying to use the hostcall.

I don't have access to a working debugger (ROCgdb doesn't seem to work for me, https://github.com/ROCm-Developer-Tools/ROCgdb/issues/5), so everything I know about this issue I've gleaned from observing behavior indirectly (basically, does the kernel complete or not?) I was wondering if the details on this CAS implementation are documented anywhere, specifically in what situations forward progress are not guaranteed? I can provide more details about how I'm invoking this intrinsic if that would help.

jpsamaroo commented 3 years ago

Nevermind, it seems like I was using it wrong (assuming that the return value was the new value of the signal, instead of the old).