ROCm / clr

MIT License
85 stars 35 forks source link

Use __hip_atomic_fetch_sub #7

Open ldrumm opened 10 months ago

ldrumm commented 10 months ago

Where available, __hip_atomic_fetch_sub can be used to implement the atomicSub family.

Introduced in llvm e3fbede7f3f

cjatin commented 10 months ago

@yxsamliu should we use the fetch_sub?

ldrumm commented 9 months ago

Thanks for checking on this, Jatin.

I've just found out that this isn't practically feasible at the moment. Atomic sub operations on shared USM address ranges can be improperly handled by the PCIe bus without explicit prefetches, and given that the Rocm/HSA drivers have no way to catch this, it can result in these sub operations not happening (see intel/llvm#7252 and this rocm ticket for details). However, I think that behaviour is a bug that needs to be addressed, and that correctness must be present at the device and driver level before anything on top can work. I'll leave this patch open as a reminder, but understand that merging it will break HIP atomics a little more until the lower levels of the stack work reliably

yxsamliu commented 9 months ago

probably we should not use __hip_atomic_fetch_sub for atomicSub_system since it does not work across PCIE. We may still use it for atomicSub.