MPLLang / mpl

The MaPLe compiler for efficient and scalable parallel functional programming
Other
306 stars 18 forks source link

Improve performance of entangled read barriers #167

Closed shwestrick closed 1 year ago

shwestrick commented 1 year ago

Switch to assuming non-quiescent for HM_EBR (management/retirement of entangled chunks) in the common case.

This allows for the read barrier to not perform any quiescence maintenance, skipping that overhead entirely.

This appears to be very important for certain benchmarks, and may have been contributing to part of our performance dip problem. For example, the entangled linden-pq benchmark no longer experiences perf dip near 50 procs, and is now over 2x faster on 72 procs.

Performance of disentangled benchmarks appears mostly unaffected by this patch.

(Some follow-up thoughts: perhaps we should revisit the EBR implementation, especially rotateAndReclaim which introduces an unknown delay. The actual work of iterating through the limbo bag and freeing individual elements could be parallelized and spread across the span, rather than scheduling it immediately onto the current processor, as it is done now...)

shwestrick commented 1 year ago

I just confirmed this improves the performance of the entangled hash-dedup benchmark by about 2x on 72 procs, with no overall space increase.

🎉

shwestrick commented 1 year ago

This is subsumed by #168, which was just merged.