Closed AntoineBut closed 2 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 97.31%. Comparing base (
615afe4
) to head (0104925
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@AntoineBut can you check what my change from BitSet
to Set
does in terms of performance?
It might be faster in theory but slower in practice, at least on instances of moderate size.
@gdalle Here are results from a rapid benchmarking :
Setup : 3 types of synthetic graphs, all in 3 sizes : 1 graph imported from Snap
julia> doro_small = dorogovtsev_mendes(10)
{10, 17} undirected simple Int64 graph
julia> doro_med = dorogovtsev_mendes(300)
{300, 597} undirected simple Int64 graph
julia> doro_large = dorogovtsev_mendes(100_000)
{100000, 199997} undirected simple Int64 graph
julia> bara_small = barabasi_albert(10, 2)
{10, 16} undirected simple Int64 graph
julia> bara_med = barabasi_albert(300, 2)
{300, 596} undirected simple Int64 graph
julia> bara_large = barabasi_albert(100_000, 2)
{100000, 199996} undirected simple Int64 graph
julia> complete_small = complete_graph(10)
{10, 45} undirected simple Int64 graph
julia> complete_med = complete_graph(300)
{300, 44850} undirected simple Int64 graph
julia> complete_large = complete_graph(10_000)
{10000, 49995000} undirected simple Int64 graph
# Real-world data graph
julia> twitch_graph = loadgraph("/Users/antoine/Documents/GitHub/ParallelGraphs.jl/benchmark/data/large_twitch_edges.csv", "twitch", EdgeListFormat())
{168116, 6797558} directed simple Int64 graph
Results :
Set{T} | BitSet | |
---|---|---|
Barabasi Albert | ||
Small | 662.556 ns (9 allocations: 928 bytes) | 411.692 ns (6 allocations: 624 bytes) |
Med | 13.791 μs (10 allocations: 11.28 KiB) | 7.208 μs (7 allocations: 10.98 KiB) |
Large | 7.714 ms (15 allocations: 3.15 MiB) | 4.736 ms (12 allocations: 3.15 MiB) |
Dorogostev Mendes | ||
Small | 638.643 ns (9 allocations: 928 bytes) | 404.084 ns (6 allocations: 624 bytes) |
Med | 13.958 μs (10 allocations: 11.28 KiB) | 7.229 μs (7 allocations: 10.98 KiB) |
Large | 7.721 ms (15 allocations: 3.15 MiB) | 4.740 ms (12 allocations: 3.15 MiB) |
Complete Graph | ||
Small | 827.465 ns (9 allocations: 928 bytes) | 479.492 ns (6 allocations: 624 bytes) |
Med | 378.917 μs (18 allocations: 21.38 KiB) | 128.459 μs (7 allocations: 8.44 KiB) |
Large | 457.570 ms (30 allocations: 438.27 KiB) | 123.255 ms (12 allocations: 246.50 KiB) |
Twitch Graph | ||
46.081 ms (18 allocations: 5.29 MiB) | 31.722 ms (12 allocations: 5.29 MiB) |
I tried to represent graphs that behave differently with respect to coloring, some of them are colored with only 5 colors (even on thousands of vertices) and the complete graph is colored in N colors. BitSet was faster in all configurations I tried, with speedup ranging from x1,4 to x3,5.
I hope that adresses your concerns! Let me know if there is anything else :)
Just out of curiosity, could you try with a BitVector? Then we'll pick the best and merge. My gut tells me it may be as fast as BitSet sometimes
This PR adresses #378 .
Replacing Vector with a Set yields immediate speedup of 50-100x on my machine.
As the we are using dense sets of integers from 1 to the number of colors to color the graph (i.e relatively small integers), BitSet implementation seems to be the most appropriate here. (and is also faster on my machine, even though I have not tested on an extensive benchmark suite)
Im open to feedback of course (I am still pretty new here) :))