celerity / celerity-runtime

High-level C++ for Accelerator Clusters
https://celerity.github.io
MIT License
141 stars 18 forks source link

Don't let partial replication unnecessarily divide push commands #229

Closed psalz closed 10 months ago

psalz commented 10 months ago

This was surfaced by 2D splits, but can also be constructed (somewhat awkwardly) for 1D: The extra region map lookup into buffer_state.replicated_regions could sometimes cause a box with the same last writer command to be unnecessarily split into several push commands. The fix is to collect all non-replicated boxes and do a final merge before generating the push commands.

github-actions[bot] commented 10 months ago

Check-perf-impact results: (f289611903bd51c6db3f757a138e69d1)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology

Relative execution time per category: (mean of relative medians)

github-actions[bot] commented 10 months ago

Check-perf-impact results: (4c65f1399a47e0eb1340f63004745b17)

:rocket: Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology

Relative execution time per category: (mean of relative medians)