Open ldrumm opened 10 months ago
@yxsamliu should we use the fetch_sub?
Thanks for checking on this, Jatin.
I've just found out that this isn't practically feasible at the moment. Atomic sub operations on shared USM address ranges can be improperly handled by the PCIe bus without explicit prefetches, and given that the Rocm/HSA drivers have no way to catch this, it can result in these sub operations not happening (see intel/llvm#7252 and this rocm ticket for details). However, I think that behaviour is a bug that needs to be addressed, and that correctness must be present at the device and driver level before anything on top can work. I'll leave this patch open as a reminder, but understand that merging it will break HIP atomics a little more until the lower levels of the stack work reliably
probably we should not use __hip_atomic_fetch_sub for atomicSub_system since it does not work across PCIE. We may still use it for atomicSub.
Where available,
__hip_atomic_fetch_sub
can be used to implement theatomicSub
family.Introduced in llvm e3fbede7f3f