Switch to assuming non-quiescent for HM_EBR (management/retirement of entangled chunks) in the common case.
This allows for the read barrier to not perform any quiescence maintenance, skipping that overhead entirely.
This appears to be very important for certain benchmarks, and may have been contributing to part of our performance dip problem. For example, the entangled linden-pq benchmark no longer experiences perf dip near 50 procs, and is now over 2x faster on 72 procs.
Performance of disentangled benchmarks appears mostly unaffected by this patch.
(Some follow-up thoughts: perhaps we should revisit the EBR implementation, especially rotateAndReclaim which introduces an unknown delay. The actual work of iterating through the limbo bag and freeing individual elements could be parallelized and spread across the span, rather than scheduling it immediately onto the current processor, as it is done now...)
Switch to assuming non-quiescent for HM_EBR (management/retirement of entangled chunks) in the common case.
This allows for the read barrier to not perform any quiescence maintenance, skipping that overhead entirely.
This appears to be very important for certain benchmarks, and may have been contributing to part of our performance dip problem. For example, the entangled
linden-pq
benchmark no longer experiences perf dip near 50 procs, and is now over 2x faster on 72 procs.Performance of disentangled benchmarks appears mostly unaffected by this patch.
(Some follow-up thoughts: perhaps we should revisit the EBR implementation, especially
rotateAndReclaim
which introduces an unknown delay. The actual work of iterating through the limbo bag and freeing individual elements could be parallelized and spread across the span, rather than scheduling it immediately onto the current processor, as it is done now...)