random_walks() seems slow

VHRanger / CSRGraph

A tiny library for large graphs

MIT License

111 stars 17 forks source link

import networkx as nx import numpy as np import csrgraph as cg from time import time import random walk_len = 20 G = nx.fast_gnp_random_graph(100, 0.3, directed=True) # just a random graph GG = cg.csrgraph(G) t1 = time() walks = GG.random_walks(walklen=walk_len) t2 = time() walks = np.zeros((len(G.nodes()), walk_len)) for i, node in enumerate(G.nodes()): cur_node = node for j in range(walk_len): neighbor = random.choice(list(G.neighbors(cur_node))) # choose one random neighbor walks[i][j] = neighbor # add it to walk cur_node = neighbor # save it to cur_node to continue from it next iteration t3 = time() print('cg ', t2-t1) print('naive ', t3-t2)

Hi @hananell

some concern here

And I tried another test below:

import networkx as nx
import numpy as np
from time import time
import random

walk_len = 20
trials = 100
warm_start = 15
warm_start_csr_runtime = []
csr_runtime = []

G = nx.fast_gnp_random_graph(100, 0.3, directed=True)    # just a random graph
GG = cg.csrgraph(G)

for t in range(warm_start):
    tic = time()

    walks = GG.random_walks(walklen=walk_len)

    toc = time()
    warm_start_csr_runtime.append(toc-tic)

print(
    warm_start_csr_runtime,
    np.mean(warm_start_csr_runtime),
    sep='\n\n'
)

gives

[1.511685848236084, 0.0009768009185791016, 0.0008192062377929688, 0.0009591579437255859, 0.001013040542602539, 0.0010869503021240234, 0.0008718967437744141, 0.0007932186126708984, 0.0009889602661132812, 0.0010161399841308594, 0.0010478496551513672, 0.0009562969207763672, 0.001085042953491211, 0.00102996826171875, 0.001010894775390625]

0.10168941815694173

As you see, the first trail took more time.

it seems like there is a compilation issue when executing _random_walk()

Since numba need to compile the data type to static one. so it cause some time when first run.

just do a warm_start to solve this.

VHRanger / CSRGraph

random_walks() seems slow #20