When benchmarking parallel application which uses Dagger, it seems like MemPool.approx_size is the bottleneck due to it falling back to Base.summarysize.
Here is a quick MWE:
julia> using BenchmarkTools, DataFrames, MemPool
julia> df = DataFrame(a=1:1000_000, b=randn(1000_000), c=repeat([:aa], 1000_000));
julia> @benchmark MemPool.approx_size($df)
BenchmarkTools.Trial:
memory estimate: 61.03 MiB
allocs estimate: 1999540
--------------
minimum time: 110.895 ms (4.59% GC)
median time: 119.604 ms (2.47% GC)
mean time: 122.978 ms (2.83% GC)
maximum time: 146.009 ms (1.46% GC)
--------------
samples: 41
evals/sample: 1
Here is a sketch of an alternative implementation which is much faster:
julia> function MemPool.approx_size(df::DataFrame)
dsize = mapreduce(MemPool.approx_size, +, eachcol(df))
namesize = mapreduce(MemPool.approx_size, +, names(df))
return dsize + namesize
end
julia> @benchmark MemPool.approx_size($df)
BenchmarkTools.Trial:
memory estimate: 704 bytes
allocs estimate: 13
--------------
minimum time: 535.700 μs (0.00% GC)
median time: 636.800 μs (0.00% GC)
mean time: 664.967 μs (0.00% GC)
maximum time: 1.525 ms (0.00% GC)
--------------
samples: 7499
evals/sample: 1
The above implementation is not 100% correct, but I hope it shows that there is some potential for improvement.
Don't know if there is some interface which can be used to avoid the dependency, e.g. Tables.jl.
When benchmarking parallel application which uses Dagger, it seems like
MemPool.approx_size
is the bottleneck due to it falling back toBase.summarysize
.Here is a quick MWE:
Here is a sketch of an alternative implementation which is much faster:
The above implementation is not 100% correct, but I hope it shows that there is some potential for improvement.
Don't know if there is some interface which can be used to avoid the dependency, e.g. Tables.jl.