VHRanger / CSRGraph

A tiny library for large graphs
MIT License
111 stars 17 forks source link

random_walks() seems slow #20

Open hananell opened 2 years ago

hananell commented 2 years ago

Hi I am searching for a quick way to do random walks and saw this package that claims to be good, but when comparing it with naive python approach, the naive approach was much faster... what am I missing?

import networkx as nx
import numpy as np
import csrgraph as cg
from time import time
import random

walk_len = 20
G = nx.fast_gnp_random_graph(100, 0.3, directed=True)    # just a random graph
GG = cg.csrgraph(G)

t1 = time()

walks = GG.random_walks(walklen=walk_len)

t2 = time()

walks = np.zeros((len(G.nodes()), walk_len))
for i, node in enumerate(G.nodes()):
    cur_node = node
    for j in range(walk_len):
        neighbor = random.choice(list(G.neighbors(cur_node)))   # choose one random neighbor
        walks[i][j] = neighbor   # add it to walk
        cur_node = neighbor   # save it to cur_node to continue from it next iteration

t3 = time()

print('cg ', t2-t1)
print('naive ', t3-t2)

output: cg 5.419240713119507 naive 0.010957479476928711

YLTsai0609 commented 1 year ago

Hi @hananell

some concern here

And I tried another test below:

import networkx as nx
import numpy as np
from time import time
import random

walk_len = 20
trials = 100
warm_start = 15
warm_start_csr_runtime = []
csr_runtime = []

G = nx.fast_gnp_random_graph(100, 0.3, directed=True)    # just a random graph
GG = cg.csrgraph(G)

for t in range(warm_start):
    tic = time()

    walks = GG.random_walks(walklen=walk_len)

    toc = time()
    warm_start_csr_runtime.append(toc-tic)

print(
    warm_start_csr_runtime,
    np.mean(warm_start_csr_runtime),
    sep='\n\n'
)

gives

[1.511685848236084, 0.0009768009185791016, 0.0008192062377929688, 0.0009591579437255859, 0.001013040542602539, 0.0010869503021240234, 0.0008718967437744141, 0.0007932186126708984, 0.0009889602661132812, 0.0010161399841308594, 0.0010478496551513672, 0.0009562969207763672, 0.001085042953491211, 0.00102996826171875, 0.001010894775390625]

0.10168941815694173

As you see, the first trail took more time.

it seems like there is a compilation issue when executing _random_walk()

Since numba need to compile the data type to static one. so it cause some time when first run.

just do a warm_start to solve this.