Closed fknorr closed 1 month ago
Check-perf-impact results: (877795252c9a57f7b343e4747db6ca4f)
:warning: Significant slowdown (>1.25x) in some microbenchmark results: 7 individual benchmarks affected
:heavy_plus_sign: Added microbenchmark(s): 48 individual benchmarks affected
:heavy_minus_sign: Removed microbenchmark(s): 48 individual benchmarks affected
Relative execution time per category: (mean of relative medians)
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
src/task.cc | 1 | 92.06% | ||
<!-- | Total: | 1 | --> |
Totals | |
---|---|
Change from base Build 10143808743: | 1.8% |
Covered Lines: | 6564 |
Relevant Lines: | 6700 |
Check-perf-impact results: (f2e639c8a97550e58528a410c1b8586d)
:warning: Significant slowdown (>1.25x) in some microbenchmark results: 8 individual benchmarks affected
:heavy_plus_sign: Added microbenchmark(s): 48 individual benchmarks affected
:heavy_minus_sign: Removed microbenchmark(s): 48 individual benchmarks affected
Relative execution time per category: (mean of relative medians)
Edit: We inadvertently disabled mimalloc. All hail the benchmark suite!
Check-perf-impact results: (2908f97f836fd2def14c3429cd4d61ac)
:warning: Significant slowdown (>1.25x) in some microbenchmark results: 5 individual benchmarks affected
:rocket: Significant speedup (<0.80x) in some microbenchmark results: generating large command graphs for N nodes - 1 / chain topology
:heavy_plus_sign: Added microbenchmark(s): 48 individual benchmarks affected
:heavy_minus_sign: Removed microbenchmark(s): 48 individual benchmarks affected
Relative execution time per category: (mean of relative medians)
This is the final PR in the IDAG series. It switches to the new IDAG-based runtime and drops all newly unused legacy components.
runtime
now manages multiple devices, and thedistr_queue
API has been updated to reflect the fact.buffer_manager
,reduction_manager
andhost_object_manager
are now gone. ID assignment for these types is now handled by the runtime directly, and all components interacting with buffers, reductions and host objects (graph generators, executor and recorders) track the relevant state themselves as instructed bynotify_*_created
/_destroyed
introduced in #246. As a result, tasks do not need to keep strong references to buffers and host objects around anymore (lifetime_extending_state
).scheduler
now generates both the command- and the instruction graph in the same thread, and maintains ownership of both structures. The CDAG is pruned at generation time (since commands never leave the scheduler thread), and the IDAG is pruned once the scheduler is notified of epoch completion. Command serialization is gone, and with it, the commandis_flushed
marker.runtime
now useslive_executor
(replacinglegacy_executor
andworker_job
) together with acommunicator
andbackend
instance to execute instructions.communicator
together withreceive_arbiter
replacebuffer_transfer_manager
.backend
implementations replacelegacy_backend
,host_queue
anddevice_queue
.~distr_queue
will continue to epoch-synchronize.runtime
asserts that non-thread-safe functions are only called from the application thread, which will trigger onaccidental value-captures of buffers / host objects into host tasks.#ifdefs
in tests and frontend code.log_context
, which was only used byworker_job
, is removed.vendor/ctpl
, which was only used byhost_queue
, is removed.Since one node now addresses multiple GPUs, scheduling becomes more expensive (IDAG generation is maybe ~4x as expensive as CDAG generation). This will be visible in benchmark results.