It seems that compilation time is not properly computed when a function is run multi-threaded, apparently increasing with the number of threads. For example, take the standard sum example of the docs:
julia> function sum_single(a)
s = 0
for i in a
s += i
end
s
end
sum_single (generic function with 1 method)
julia> function sum_multi_good(a)
chunks = Iterators.partition(a, length(a) ÷ Threads.nthreads())
tasks = map(chunks) do chunk
Threads.@spawn sum_single(chunk)
end
chunk_sums = fetch.(tasks)
return sum_single(chunk_sums)
end
sum_multi_good (generic function with 1 method)
It seems that compilation time is not properly computed when a function is run multi-threaded, apparently increasing with the number of threads. For example, take the standard sum example of the docs:
If I start Julia with 4 threads, I get:
Now if I start with 8 threads, but increase the workload by 2, I get:
And now with 16 threads:
Note how the excution time is similar for the first run in all cases, but reported compilation time is increasing.
With 32 threads I get:
(my computer does not have such many cores).
These compilation times do not seem to make sense.