celerity / celerity-runtime

High-level C++ for Accelerator Clusters
https://celerity.github.io
MIT License
139 stars 18 forks source link

Detect uninitialized reads and overlapping writes #224

Closed fknorr closed 9 months ago

fknorr commented 10 months ago

During IDAG development, asserting that instructions do not read from uninitialized memory turned out to be an invaluable tool for finding bugs in dependency tracking and data migration. This PR is a back-port of that detection logic, extended to also find overlapping writes between chunks on CDAG generation.

By default, task_manager and distributed_graph_generator throw when detecting an uninitialized read or overlapping writes in order to catch internal errors and report tests with bogus access patterns. In the runtime, results from these checks are reported through warning (uninitialized read) and error (overlapping writes) logs instead - both for backwards compatibillity, and because uninitialized reads are expected when submitting kernels with read_write access in a loop (see also celerity/meta#74).

All tests with incorrect access patterns were fixed, and the reporting feature is tested both with the TDAG / CDAG test context infrastructure as well as the actual runtime.

github-actions[bot] commented 10 months ago

no optimizations

Check-perf-impact results: (1d1f40270fc25a329bb16053c8fa15f2)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology, generating large command graphs for N nodes - 16 / wave_sim topology, benchmark independent task pattern with N tasks - 100 / task generation
:rocket: Significant speedup (<0.80x) in some microbenchmark results: normalizing a fully mergeable tiling of boxes - 1 / large, embedded in 3d

Relative execution time per category: (mean of relative medians)

github-actions[bot] commented 10 months ago

distributed overlapping-write detection

Check-perf-impact results: (7e3c8bfa480dfad2a4f08c4fabe32cc2)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: normalizing randomized box sets - 3d / small - native, building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology, benchmark independent task pattern with N tasks - 100 / task generation
:rocket: Significant speedup (<0.80x) in some microbenchmark results: benchmark stencil pattern with N time steps - 50 / iterations

Relative execution time per category: (mean of relative medians)

github-actions[bot] commented 10 months ago

distributed overlapping-write detection + bounding-box uninitialized-read detection

Check-perf-impact results: (385bf6fbfe375253106eaa9f9b53a88e)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: benchmark independent task pattern with N tasks - 100 / task generation
:rocket: Significant speedup (<0.80x) in some microbenchmark results: benchmark stencil pattern with N time steps - 1000 / iterations

Relative execution time per category: (mean of relative medians)

fknorr commented 10 months ago

I managed to optimize this a bit further, but CDAG generation still takes a 10% performance hit.

github-actions[bot] commented 9 months ago

disabling diagnostics by default in release builds

Check-perf-impact results: (9c7d95e3b2850796c2d810616c0f1aff)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
:rocket: Significant speedup (<0.80x) in some microbenchmark results: normalizing a fully mergeable tiling of boxes - 1 / large, embedded in 3d

Relative execution time per category: (mean of relative medians)

github-actions[bot] commented 9 months ago

do not maintain initialized-region when diagnostics are disabled

Check-perf-impact results: (a1f01e5fe178a68d42a7063603612487)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
:rocket: Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / chain topology

Relative execution time per category: (mean of relative medians)

fknorr commented 9 months ago

I have turned the task-manager / graph-generator policy settings into constructor arguments; that way I can skip updating the initialized-region of a buffer when diagnostics are disabled (previously this region had to be maintained in anticipation of a change in polices, which of course never happened). I've also added a bit more documentation.

github-actions[bot] commented 9 months ago

Check-perf-impact results: (3b34e58e3c100f4c3541a1ed59580f72)

:question: No new benchmark data submitted. :question:
Please re-run the microbenchmarks and include the results if your commit could potentially affect performance.

github-actions[bot] commented 9 months ago

Check-perf-impact results: (54ff34045c40bf12cbc2ca95f475792b)

:warning: Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
:rocket: Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / chain topology, building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / contracting tree topology

Relative execution time per category: (mean of relative medians)

fknorr commented 9 months ago

I have used this opportunity to refactor how task debug names are generated from class KernelName types and printed in debug outputs. Unnamed kernels now do not show "unnamed_kernel" as their name anymore and the task type is visualized as well.

There is now a new step_builder::name() function to conveniently set task debug names from a distributed_graph_generator_test_context.