Closed jeffhammond closed 6 months ago
@tgmattso if you are looking for an excuse to write examples of OpenMP atomics, this is a good one.
Yes, this code is wrong. It works on x86 since we apply relaxed atomics in the x86 memory model. The current code is based on the very old, original OpenMP. We realized we needed a more flexible mechanism built around atomics which we added in OpenMP 4.0. I haven't tested this yet in syncP2P, but the right patter is the following.
if (TID==0) { /* first thread waits for corner value to be copied */
while (1) {
#pragma omp atomic read seq_cst
flg_tmp = flag(0,0);
if (flg_tmp == true) break;
}
#if SYNCHRONOUS
#pragma omp atomic write seq_cst
flag(0,0)= true;
#endif
}
sorry, I accidently marked this as closed. It's not closed until I test and verify the code.
Geez, who wrote that shitty code?
I don't know. :)
But I take considerable blame for this since for many years, that is how I told people to handle point to point synchronization in OpenMP. The real problem is the OpenMP specification prior to OpenMP 4.0. Those of us creating it didn't know better and created an API with insufficient atomics to write such code. Shame on them.
What's funny is that as I was implementing the synch_p2p code I looked for examples of the type of synchronization needed and landed on the LU NAS Parallel Benchmark. I soon discovered it was wrong and had been wrong for years. I was very proud of my discovery and made sure not to repeat the mistake with synch_p2p. Or so I thought ...
i finally got around to fixing this. hopefully at least one of you has a chance to review it.
Nondeterministic failures in OpenMP Synch_p2p on Graviton 3 suggest the code depends on x86 memory model behavior and needs fixing.
This pattern is almost certainly wrong in general (i.e. not x86).