JuliaGraphs / Graphs.jl

An optimized graphs package for the Julia programming language
http://juliagraphs.org/Graphs.jl/
Other
451 stars 87 forks source link

Replaced Vector data structure by BitSet in greedy_color #379

Closed AntoineBut closed 2 months ago

AntoineBut commented 2 months ago

This PR adresses #378 .

Replacing Vector with a Set yields immediate speedup of 50-100x on my machine.

As the we are using dense sets of integers from 1 to the number of colors to color the graph (i.e relatively small integers), BitSet implementation seems to be the most appropriate here. (and is also faster on my machine, even though I have not tested on an extensive benchmark suite)

Im open to feedback of course (I am still pretty new here) :))

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.31%. Comparing base (615afe4) to head (0104925).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #379 +/- ## ======================================= Coverage 97.31% 97.31% ======================================= Files 120 120 Lines 6953 6954 +1 ======================================= + Hits 6766 6767 +1 Misses 187 187 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gdalle commented 2 months ago

@AntoineBut can you check what my change from BitSet to Set does in terms of performance? It might be faster in theory but slower in practice, at least on instances of moderate size.

AntoineBut commented 2 months ago

@gdalle Here are results from a rapid benchmarking :

Setup : 3 types of synthetic graphs, all in 3 sizes : 1 graph imported from Snap

julia> doro_small = dorogovtsev_mendes(10)
{10, 17} undirected simple Int64 graph

julia> doro_med = dorogovtsev_mendes(300)
{300, 597} undirected simple Int64 graph

julia> doro_large = dorogovtsev_mendes(100_000)
{100000, 199997} undirected simple Int64 graph

julia> bara_small = barabasi_albert(10, 2)
{10, 16} undirected simple Int64 graph

julia> bara_med = barabasi_albert(300, 2)
{300, 596} undirected simple Int64 graph

julia> bara_large = barabasi_albert(100_000, 2)
{100000, 199996} undirected simple Int64 graph

julia> complete_small = complete_graph(10)
{10, 45} undirected simple Int64 graph

julia> complete_med = complete_graph(300)
{300, 44850} undirected simple Int64 graph

julia> complete_large = complete_graph(10_000)
{10000, 49995000} undirected simple Int64 graph

# Real-world data graph 
julia> twitch_graph = loadgraph("/Users/antoine/Documents/GitHub/ParallelGraphs.jl/benchmark/data/large_twitch_edges.csv", "twitch", EdgeListFormat())
{168116, 6797558} directed simple Int64 graph

Results :

  Set{T} BitSet
Barabasi Albert    
Small 662.556 ns (9 allocations: 928 bytes) 411.692 ns (6 allocations: 624 bytes)
Med 13.791 μs (10 allocations: 11.28 KiB) 7.208 μs (7 allocations: 10.98 KiB)
Large 7.714 ms (15 allocations: 3.15 MiB) 4.736 ms (12 allocations: 3.15 MiB)
Dorogostev Mendes    
Small 638.643 ns (9 allocations: 928 bytes) 404.084 ns (6 allocations: 624 bytes)
Med 13.958 μs (10 allocations: 11.28 KiB) 7.229 μs (7 allocations: 10.98 KiB)
Large 7.721 ms (15 allocations: 3.15 MiB) 4.740 ms (12 allocations: 3.15 MiB)
Complete Graph    
Small 827.465 ns (9 allocations: 928 bytes) 479.492 ns (6 allocations: 624 bytes)
Med 378.917 μs (18 allocations: 21.38 KiB) 128.459 μs (7 allocations: 8.44 KiB)
Large 457.570 ms (30 allocations: 438.27 KiB) 123.255 ms (12 allocations: 246.50 KiB)
Twitch Graph    
  46.081 ms (18 allocations: 5.29 MiB) 31.722 ms (12 allocations: 5.29 MiB)

I tried to represent graphs that behave differently with respect to coloring, some of them are colored with only 5 colors (even on thousands of vertices) and the complete graph is colored in N colors. BitSet was faster in all configurations I tried, with speedup ranging from x1,4 to x3,5.

I hope that adresses your concerns! Let me know if there is anything else :)

gdalle commented 2 months ago

Just out of curiosity, could you try with a BitVector? Then we'll pick the best and merge. My gut tells me it may be as fast as BitSet sometimes