add benchmark script - Githubissues

CarloLucibello commented 3 years ago

Adding a script doing some basic benchmarking, in order to track performance changes as the library evolves.

I selected for testing the GCNLayer and the GraphConv layers as representatives of the algebraic and the message passing approaches respectively. Edit: Also added GATConv. The benchmarks are on random graphs of small and up to moderate size (10 to 10k nodes). Different graph storage types are tested (adjacency list, dense adjacency matrix, sparse adjacency matrix, LightGraphs.SimpleGraph). These are the results on ~my laptop~ a server, which I'm also committing as a reference

julia> df = run_benchmarks();

julia> for g in groupby(df, :layer); println(g,"\n"); end
16×8 SubDataFrame
 Row │ N       c         layer    gtype       time_fg                    time_cpu                   time_gpu  gpu_to_cpu 
     │ Int64?  Float64?  String?  Symbol?     Any                        Any                        Any       Float64?   
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     10       6.0  GATConv  adjlist     TrialEstimate(18.473 μs)   TrialEstimate(852.293 μs)  missing          Inf
   2 │     10       6.0  GATConv  dense       TrialEstimate(3.054 μs)    TrialEstimate(831.063 μs)  missing          Inf
   3 │     10       6.0  GATConv  lightgraph  TrialEstimate(56.061 ns)   TrialEstimate(871.419 μs)  missing          Inf
   4 │     10       6.0  GATConv  sparse      TrialEstimate(4.438 μs)    TrialEstimate(1.128 ms)    missing          Inf
   5 │    100       6.0  GATConv  adjlist     TrialEstimate(161.550 μs)  TrialEstimate(11.370 ms)   missing          Inf
   6 │    100       6.0  GATConv  dense       TrialEstimate(76.516 μs)   TrialEstimate(11.984 ms)   missing          Inf
   7 │    100       6.0  GATConv  lightgraph  TrialEstimate(55.989 ns)   TrialEstimate(11.083 ms)   missing          Inf
   8 │    100       6.0  GATConv  sparse      TrialEstimate(70.337 μs)   TrialEstimate(14.045 ms)   missing          Inf
   9 │   1000       6.0  GATConv  adjlist     TrialEstimate(1.864 ms)    TrialEstimate(447.102 ms)  missing          Inf
  10 │   1000       6.0  GATConv  dense       TrialEstimate(9.331 ms)    TrialEstimate(465.438 ms)  missing          Inf
  11 │   1000       6.0  GATConv  lightgraph  TrialEstimate(56.867 ns)   TrialEstimate(447.711 ms)  missing          Inf
  12 │   1000       6.0  GATConv  sparse      TrialEstimate(6.740 ms)    TrialEstimate(471.377 ms)  missing          Inf
  13 │  10000       6.0  GATConv  adjlist     TrialEstimate(22.646 ms)   TrialEstimate(3.384 s)     missing          Inf
  14 │  10000       6.0  GATConv  dense       TrialEstimate(1.435 s)     TrialEstimate(6.987 s)     missing          Inf
  15 │  10000       6.0  GATConv  lightgraph  TrialEstimate(57.248 ns)   TrialEstimate(3.227 s)     missing          Inf
  16 │  10000       6.0  GATConv  sparse      TrialEstimate(1.002 s)     TrialEstimate(6.090 s)     missing          Inf

16×8 SubDataFrame
 Row │ N       c         layer    gtype       time_fg                    time_cpu                   time_gpu  gpu_to_cpu 
     │ Int64?  Float64?  String?  Symbol?     Any                        Any                        Any       Float64?   
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     10       6.0  GCNConv  adjlist     TrialEstimate(18.570 μs)   missing                    missing      missing 
   2 │     10       6.0  GCNConv  dense       TrialEstimate(3.109 μs)    TrialEstimate(21.885 μs)   missing          Inf
   3 │     10       6.0  GCNConv  lightgraph  TrialEstimate(47.979 ns)   TrialEstimate(21.825 μs)   missing          Inf
   4 │     10       6.0  GCNConv  sparse      TrialEstimate(4.423 μs)    TrialEstimate(28.175 μs)   missing          Inf
   5 │    100       6.0  GCNConv  adjlist     TrialEstimate(160.488 μs)  missing                    missing      missing 
   6 │    100       6.0  GCNConv  dense       TrialEstimate(77.154 μs)   TrialEstimate(250.699 μs)  missing          Inf
   7 │    100       6.0  GCNConv  lightgraph  TrialEstimate(55.957 ns)   TrialEstimate(121.048 μs)  missing          Inf
   8 │    100       6.0  GCNConv  sparse      TrialEstimate(71.684 μs)   TrialEstimate(195.570 μs)  missing          Inf
   9 │   1000       6.0  GCNConv  adjlist     TrialEstimate(1.841 ms)    missing                    missing      missing 
  10 │   1000       6.0  GCNConv  dense       TrialEstimate(8.867 ms)    TrialEstimate(22.922 ms)   missing          Inf
  11 │   1000       6.0  GCNConv  lightgraph  TrialEstimate(56.552 ns)   TrialEstimate(1.026 ms)    missing          Inf
  12 │   1000       6.0  GCNConv  sparse      TrialEstimate(6.447 ms)    TrialEstimate(7.680 ms)    missing          Inf
  13 │  10000       6.0  GCNConv  adjlist     TrialEstimate(22.145 ms)   missing                    missing      missing 
  14 │  10000       6.0  GCNConv  dense       TrialEstimate(1.450 s)     TrialEstimate(3.092 s)     missing          Inf
  15 │  10000       6.0  GCNConv  lightgraph  TrialEstimate(57.708 ns)   TrialEstimate(9.887 ms)    missing          Inf
  16 │  10000       6.0  GCNConv  sparse      TrialEstimate(996.760 ms)  TrialEstimate(1.012 s)     missing          Inf

16×8 SubDataFrame
 Row │ N       c         layer      gtype       time_fg                    time_cpu                   time_gpu  gpu_to_cpu 
     │ Int64?  Float64?  String?    Symbol?     Any                        Any                        Any       Float64?   
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     10       6.0  GraphConv  adjlist     TrialEstimate(18.532 μs)   TrialEstimate(429.637 μs)  missing          Inf
   2 │     10       6.0  GraphConv  dense       TrialEstimate(3.113 μs)    TrialEstimate(425.891 μs)  missing          Inf
   3 │     10       6.0  GraphConv  lightgraph  TrialEstimate(46.217 ns)   TrialEstimate(395.829 μs)  missing          Inf
   4 │     10       6.0  GraphConv  sparse      TrialEstimate(4.404 μs)    TrialEstimate(457.810 μs)  missing          Inf
   5 │    100       6.0  GraphConv  adjlist     TrialEstimate(161.810 μs)  TrialEstimate(7.470 ms)    missing          Inf
   6 │    100       6.0  GraphConv  dense       TrialEstimate(76.830 μs)   TrialEstimate(7.544 ms)    missing          Inf
   7 │    100       6.0  GraphConv  lightgraph  TrialEstimate(55.750 ns)   TrialEstimate(7.296 ms)    missing          Inf
   8 │    100       6.0  GraphConv  sparse      TrialEstimate(76.117 μs)   TrialEstimate(7.797 ms)    missing          Inf
   9 │   1000       6.0  GraphConv  adjlist     TrialEstimate(1.849 ms)    missing                    missing      missing 
  10 │   1000       6.0  GraphConv  dense       TrialEstimate(9.295 ms)    missing                    missing      missing 
  11 │   1000       6.0  GraphConv  lightgraph  TrialEstimate(56.272 ns)   missing                    missing      missing 
  12 │   1000       6.0  GraphConv  sparse      TrialEstimate(6.479 ms)    missing                    missing      missing 
  13 │  10000       6.0  GraphConv  adjlist     TrialEstimate(23.406 ms)   missing                    missing      missing 
  14 │  10000       6.0  GraphConv  dense       TrialEstimate(1.438 s)     missing                    missing      missing 
  15 │  10000       6.0  GraphConv  lightgraph  TrialEstimate(57.641 ns)   missing                    missing      missing 
  16 │  10000       6.0  GraphConv  sparse      TrialEstimate(1.004 s)     missing                    missing      missing

A few notes:

cuda support is broken for all graph types and all layers
cpu support is broken for GCNConv with adjacency list graph type
GraphConv is more than 10x slower than GCNConv, at least at smaller sizes. Ideally, performance should be comparable

Quite surprisingly, the most performant graph type seems to be the LightGraphs's one, with several orders of magnitude better performances with GCNConv. Since GCNConv calls adjacency_matrix, that's a problem (for the other graph types) that should be easily fixed by providing better specialized methods. This is the comparison between lightgraph and other storage types

julia> dfcomp = compare(dfmaster, dfmaster);

julia> filter(r -> r.layer == "GCNConv" && r.in_type_master==:lightgraph, dfcomp)
16×6 DataFrame
 Row │ n       in_type_pr  in_type_master  layer    pr_to_master_cpu       pr_to_master_gpu 
     │ Int64?  Symbol?     Symbol?         String?  TrialRatio…?           Missing          
─────┼──────────────────────────────────────────────────────────────────────────────────────
   1 │     10  adjlist     lightgraph      GCNConv  missing                         missing 
   2 │     10  dense       lightgraph      GCNConv  TrialRatio(99.99%)              missing 
   3 │     10  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
   4 │     10  sparse      lightgraph      GCNConv  TrialRatio(119.10%)             missing 
   5 │    100  adjlist     lightgraph      GCNConv  missing                         missing 
   6 │    100  dense       lightgraph      GCNConv  TrialRatio(220.54%)             missing 
   7 │    100  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
   8 │    100  sparse      lightgraph      GCNConv  TrialRatio(116.22%)             missing 
   9 │   1000  adjlist     lightgraph      GCNConv  missing                         missing 
  10 │   1000  dense       lightgraph      GCNConv  TrialRatio(1774.30%)            missing 
  11 │   1000  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
  12 │   1000  sparse      lightgraph      GCNConv  TrialRatio(493.16%)             missing 
  13 │  10000  adjlist     lightgraph      GCNConv  missing                         missing 
  14 │  10000  dense       lightgraph      GCNConv  TrialRatio(20334.98%)           missing 
  15 │  10000  lightgraph  lightgraph      GCNConv  TrialRatio(100.00%)             missing 
  16 │  10000  sparse      lightgraph      GCNConv  TrialRatio(6174.97%)            missing 

julia> filter(r -> r.layer == "GraphConv" && r.in_type_master==:lightgraph, dfcomp)
16×6 DataFrame
 Row │ n       in_type_pr  in_type_master  layer      pr_to_master_cpu     pr_to_master_gpu 
     │ Int64?  Symbol?     Symbol?         String?    TrialRatio…?         Missing          
─────┼──────────────────────────────────────────────────────────────────────────────────────
   1 │     10  adjlist     lightgraph      GraphConv  TrialRatio(107.25%)           missing 
   2 │     10  dense       lightgraph      GraphConv  TrialRatio(102.36%)           missing 
   3 │     10  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
   4 │     10  sparse      lightgraph      GraphConv  TrialRatio(90.26%)            missing 
   5 │    100  adjlist     lightgraph      GraphConv  TrialRatio(106.08%)           missing 
   6 │    100  dense       lightgraph      GraphConv  TrialRatio(105.67%)           missing 
   7 │    100  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
   8 │    100  sparse      lightgraph      GraphConv  TrialRatio(90.70%)            missing 
   9 │   1000  adjlist     lightgraph      GraphConv  TrialRatio(98.73%)            missing 
  10 │   1000  dense       lightgraph      GraphConv  TrialRatio(109.35%)           missing 
  11 │   1000  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
  12 │   1000  sparse      lightgraph      GraphConv  TrialRatio(132.64%)           missing 
  13 │  10000  adjlist     lightgraph      GraphConv  TrialRatio(99.90%)            missing 
  14 │  10000  dense       lightgraph      GraphConv  TrialRatio(279.31%)           missing 
  15 │  10000  lightgraph  lightgraph      GraphConv  TrialRatio(100.00%)           missing 
  16 │  10000  sparse      lightgraph      GraphConv  TrialRatio(228.48%)           missing

CarloLucibello commented 3 years ago

It would be nice to get this in quickly to help with other PRs

yuehhua commented 3 years ago

Thank you for your work for benchmarking graph convolutional layers. Not only the scale of graph is important, but also the edge number does matter. A graph usually contains less edges. For example, there are N nodes in a graph and the total possible edges are E = N(N+1)/2. Maybe a graph has 0.5 E edges, even less to 0.1 E edges. For a benchmark script, I think it is good to have kwargs for edge_ratio to set the proportion of randomly connected edges.

GraphConv is more than 10x slower than GCNConv, at least at smaller sizes.

Actually, I have no much idea about this.

Since GCNConv calls adjacency_matrix, that's a problem (for the other graph types) that should be easily fixed by providing better specialized methods.

Either adjacency matrix or other format, it should be transformed into normalized Laplacian matrix. The specialized method can be support of normalized Laplacian matrix for each format.

CarloLucibello commented 3 years ago

For example, there are N nodes in a graph and the total possible edges are E = N(N+1)/2. Maybe a graph has 0.5 E edges, even less to 0.1 E edges. For a benchmark script, I think it is good to have kwargs for edge_ratio to set the proportion of randomly connected edges.

Most natural networks are better modeled by an average degree which is constant or scales very weakly with N, having O(N) degree would produce unrealistically dense graphs. I added the degree as a keyword argument to run_benchmarks so that one can experiment with different values.

CarloLucibello commented 3 years ago

Benchmarking on erdos-renyi exposed another bug: GraphConv fails when the graph contains isolated nodes

CarloLucibello commented 3 years ago

Can we merge this?

yuehhua commented 3 years ago

I would merge this first, but I think it is better to benchmark the conversion function laplacian_matrix and pure propagate function for each graph structures. Pure means there message function and update function returns directly. For example, message(x_i, x_j) = x_j and update(x, m) = m. These micro-benchmarks may represent the effort have to take to convert from original structure and the indexing performance for a structure.

FluxML / GeometricFlux.jl

add benchmark script #214