Closed psalz closed 10 months ago
Check-perf-impact results: (f289611903bd51c6db3f757a138e69d1)
:warning: Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
Relative execution time per category: (mean of relative medians)
Check-perf-impact results: (4c65f1399a47e0eb1340f63004745b17)
:rocket: Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
Relative execution time per category: (mean of relative medians)
This was surfaced by 2D splits, but can also be constructed (somewhat awkwardly) for 1D: The extra region map lookup into
buffer_state.replicated_regions
could sometimes cause a box with the same last writer command to be unnecessarily split into several push commands. The fix is to collect all non-replicated boxes and do a final merge before generating the push commands.