ARM-software / synchronization-benchmarks

Collection of synchronization micro-benchmarks and traces from infrastructure applications
Other
37 stars 36 forks source link

Fix race condition in synchronize_threads() #62

Closed geoffreyblake closed 3 years ago

geoffreyblake commented 3 years ago

Discovered a race condition in the synchronize_threads thread barrier implementation in atomics.h. Threads could race between the fetchadd64_acquire and the checking on the value of the barrier, which occasionally leads to deadlock. Fix is to use the returned old value of the barrier to check if the exit condition has been reached.

geoffreyblake commented 3 years ago

@rpgolshan, are you still a maintainer of this project and able to help merge this?

rpgolshan commented 3 years ago

I suppose I am by default.

This change looks fine and passes the unit tests on an m6g instance.

Just to be clear on the issue this fixes: Let's say there are 3 threads. One thread can get stuck in the "else" statement in synchronize_threads because the other two threads go into the "if" statement due to a race condition. And this commit ensures only one thread can enter the "if" statement. Is that right?

geoffreyblake commented 3 years ago

What can happen is this with 2 threads, doing the synchronize_threads routine back-to-back to sync up after thread creation, then for timer calibration:

Thread 0 enters synchronize_threads Thread 1 enters synchronize_threads Thread 0 does atomic-add Thread 1 does atomic-add Thread 1 reads *barrier and sees the counter needs to be reset Thread 0 reads *barrier and sees both threads have entered and does the store, resetting the sense Thread 0 re-enters synchronize_threads and does the atomic-add Thread 1 resets the counter and sets sense to what Thread 0 already did <--- oops is here Thread 0 reads *barrier, goes into the else to wait Thread 1 enters synchronize_threads and does the atomic-add Thread 1 has erased Thread 0's atomic-add, so enters the else and waits with Thread 0 after reading *barrier, result = deadlock

rpgolshan commented 3 years ago

I see. LGTM.