Open rgankema opened 3 years ago
Here's another, maybe a bit simpler example, which shows the same affect as above using @time
. In this program, you can see that timing one Task which doesn't have any compilation time, running concurrently with another Task on the same thread that does reports the compilation time in the call to @time
, even though it wasn't caused by that Task.
(Note that this example doesn't work if you copy/paste into the REPL; i think because it's compiling the REPL display
output at the same time as the program runs, so both @time
calls show some compilation time. So instead, I run this example as a script.)
# async-time-example.jl
const running = Threads.Atomic{Bool}(true)
loop() = while running[] ; yield(); GC.safepoint(); end
# Run once as warmup
running[] = true
t = @async loop()
sleep(0.1) ; running[] = false
wait(t)
# ----------------------------
running[] = true
# Start timing a task, which doesn't have any compilation time.
t = @async @time loop()
# Finish timing the task. Note no compilation time.
sleep(0.1) ; running[] = false
wait(t)
# ----------------------------
running[] = true
# Now start timing the *same* task, which shouldn't have any compilation time.
t = @async @time while running[] ; yield(); GC.safepoint(); end
# But start another task *on the same thread,* which DOES spend some compilation time.
t2 = @async (@eval f() = 2+2; @eval f())
# Finish timing the original task, and observe that the compilation time from `t2`
# was recorded by the `@time` in `t`.
sleep(0.1) ; running[] = false
wait(t) ; wait(t2)
And here's the output:
$ julia ~/Downloads/async-time-example.jl
0.079854 seconds (72 allocations: 4.594 KiB)
0.093527 seconds (2.27 k allocations: 158.235 KiB, 4.59% compilation time)
The request here is to record compilation time via task-local storage, instead of through thread-local storage.
Worth noting that actually this is not unique to compilation time. The number of allocations are tracked globally as well, as shown in this simple example:
julia> function f(x::Array{Any}) x[1] + x[1] end
f (generic function with 1 method)
julia> t = @async @time sleep(10)
Task (runnable) @0x000000010b059000
julia> @time for _ in 1:3000000 f(Any[2]) end
0.204893 seconds (3.00 M allocations: 274.658 MiB, 16.42% gc time)
julia> 10.005931 seconds (3.04 M allocations: 277.261 MiB, 0.34% gc time, 0.19% compilation time)
This would be great to change for all the metric in @time
, though i don't know how feasible that is.
IIRC, the code is currently taking extra effort to make this counter global (instead of per-thread). The challenge may be deciding if we should ever "charge" thread costs (time, memory, etc.) to the parent Tasks (all that called 'wait' on it), and how to handle task thread-migration.
Tracking this discussion. I just came across the same issue :)
Thanks @vtjnash - that makes sense. Interesting!
Yeah, after thinking more about it, i can see why the global accounting is actually really valuable for some situations.
It seems like maybe we'd ideally want both types of metrics? In addition to the global stuff we currently report, it would be great to also be able to track:
Does that make sense to you? That we could (have the option to) record both types of metrics: task-local (and maybe including any child tasks that we wait on - as you say), and global?
@vtjnash -
IIRC, the code is currently taking extra effort to make this counter global (instead of per-thread).
After looking at this a bit more, i don't think the jl_cumulative_compile_time_ns_before
is global though. It's per-thread.
And actually, @janrous-rai just pointed out to me that I think there's actually a race condition here, because if multiple Tasks scheduled on the same thread are both measuring @time
, the first one to finish will disable jl_measure_compile_time
for that thread, and the other one will miss some of its compilation time because it was disabled out from under it! 😮
To fix the race condition, do you think we could at least change jl_measure_compile_time
to be:
A) an thread-safe and/or atomic variable
B) a counter instead of a boolean? So that only the it will remain true
as long as at least one Task on the thread is currently measuring compilation time?
Alternatively, is it really so expensive to measure the compilation time that we need to enable/disable it? Could we just keep this as a cumulative count of all compilation time, and remove the jl_measure_compile_time
check entirely?
For my work at RelationalAI I'm trying to gather metrics on how much time is spent in run-time compilation while evaluating queries. I was hoping to use
jl_cumulative_compile_time_ns_[before/after]
for this, a little bit like this:This works fine if there are no background tasks and queries run sequentially. It also works fine in a multi-threaded scenario, as long as only a single task runs at-a-time at any given thread, because
jl_cumulative_compile_time_ns_[before/after]
seem to use separate counters per thread. However, it breaks when multiple tasks are multiplexed on the same thread. For instance, if one call ofevaluate_query
has a very long runningquery_fn
, and other tasks that do work that incur compilation time are running at the same time on that thread, that firstevaluate_query
call will also record all the compilation time triggered by those others tasks. This means that we're overestimating how much time we spent in compilation for that given query, potentially by a very large margin.To see this behavior in action, please consider the following MRE:
This outputs:
As you can see, the last call to
do_something_and_measure_compilation_time
recorded the compilation time of the background task. Unfortunately that behavior makes it unusable for the use-case described above. We'd therefore like to request thatjl_cumulative_compile_time_ns_[before/after]
is changed such that it keeps a counter per task, rather than per thread. Thanks!