Closed sbromberger closed 1 year ago
Could this have been due to the type instability?
https://github.com/JuliaCollections/DataStructures.jl/pull/263
I've compared BinaryHeap
vs MutableBinaryHeap
using the benchmark script in bench_heap
on master and BinaryHeap
is always faster. e.g.:
julia> results["BinaryHeap"]["pop"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"Int64" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"10^3" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Min" => Trial(33.508 μs)
"10^1" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Min" => Trial(119.000 ns)
"Float64" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"10^3" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"SlowMin" => Trial(59.742 μs)
"Min" => Trial(37.784 μs)
"10^1" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"SlowMin" => Trial(136.000 ns)
"Min" => Trial(119.000 ns)
julia> results["MutableBinaryHeap"]["pop"]
2-element BenchmarkTools.BenchmarkGroup:
tags: []
"Int64" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"10^3" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Min" => Trial(49.894 μs)
"10^1" => 1-element BenchmarkTools.BenchmarkGroup:
tags: []
"Min" => Trial(178.000 ns)
"Float64" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"10^3" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"SlowMin" => Trial(74.890 μs)
"Min" => Trial(47.017 μs)
"10^1" => 2-element BenchmarkTools.BenchmarkGroup:
tags: []
"SlowMin" => Trial(216.000 ns)
"Min" => Trial(169.000 ns)
julia>
I've been using
mutable_binary_minheap
for a Dijkstra calculation and am getting reasonable performance for betweenness centrality (which basically does all-pairs Dijkstra) in a large graph:Realizing that I really didn't need the mutability (I'm just
push!
ing andpop!
ping), I switched over tobinary_minheap
, thinking that I'd get at least the same, but possibly better, performance. However:These results are repeatable (I ran each test 3 times).
Why would
binary_minheap
be so much slower than its mutable counterpart, and why would it require more than twice the number of allocations (note that the amount of memory allocated is also different, but maybe not significantly so)?Thanks for any assistance / insight.