dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.24k stars 2.99k forks source link

random walk with restart (dgl 0.4.3 version) #6159

Open atesnrllh opened 11 months ago

atesnrllh commented 11 months ago

I understood that "restart_prob" effect random walk length and "max_nodes_per_seed" effect trace numbers. I tested "random_walk_with_restart" with many different parameters. I notice that "restart_prob" effect trace number also. But I dont understand how "restart_prob" effect trace number? Now I am using dgl new version and I should understand this old function to code it with new dgl library.

BarclayII commented 11 months ago

What do you mean by "trace number"?

atesnrllh commented 11 months ago

In the old version a "max_nodes_per_seed" parameter gives the maximum number of traces. But the number of traces is variable. That's why I thought the number of traces depends on the "restart_prob" parameter. Because when this parameter gets bigger, the total number of traces also increases, and when it gets smaller, the total number of traces also decreases. Actually, the "restart_prob" parameter determines the length of the random walk. I know this. But I am not sure if it affects the total trace count. If it does, how does it affect it?

dgl 0.4.3 random_walk_with_restart function

**def random_walk_with_restart( g, seeds, restart_prob, max_nodes_per_seed, max_visit_counts=0, max_frequent_visited_nodes=0): """Batch-generate random walk traces on given graph with restart probability.

Parameters
----------
g : DGLGraph
    The graph.
seeds : Tensor
    The node ID tensor from which the random walk traces starts.
restart_prob : float
    Probability to stop a random walk after each step.
max_nodes_per_seed : int
    Stop generating traces for a seed if the total number of nodes
    visited exceeds this number. [1]
max_visit_counts : int, optional
max_frequent_visited_nodes : int, optional
    Alternatively, stop generating traces for a seed if no less than
    ``max_frequent_visited_nodes`` are visited no less than
    ``max_visit_counts`` times.  [1]**
BarclayII commented 11 months ago

It shouldn't affect the number of random walk paths. That should be always equal to the number of seeds.

atesnrllh commented 11 months ago

Everything is the same except restart_prob. I tested "0.4.3 dgl random_walk_with_restart" function with different restart_prob. I run it 10 times with for loop.

walks = dgl.contrib.sampling.random_walk_with_restart(G, seeds=[1], restart_prob=0.9,max_nodes_per_seed=256)

Total number of traces for 10 for loops: 2293

walks = dgl.contrib.sampling.random_walk_with_restart(G, seeds=[1], restart_prob=0.5,max_nodes_per_seed=256)

Total number of traces for 10 for loops: 1307

walks = dgl.contrib.sampling.random_walk_with_restart(G, seeds=[1], restart_prob=0.1,max_nodes_per_seed=256)

Total number of traces for 10 for loops: : 237

BarclayII commented 10 months ago

We have stopped supporting 0.4.3. Could you check out this function that also has restart probability support?

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you