dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.55k stars 3.01k forks source link

[Feature Request] Distributed Random Walks #2693

Open BarclayII opened 3 years ago

BarclayII commented 3 years ago

🚀 Feature

We are asked for distributed (metapath-based) random walk support so I'm bookkeeping the issue here.

Motivation

Large-scale network embedding for the following:

Short random walks are also necessary for loss computation in negative sampling, e.g.

Alternatives

Use a third-party tool to sample the random walk paths first. May work for network embedding methods but not that straightforward for loss computation with random walks (like GraphSAGE or GATNE).

Pitch

Efficient distributed random walk support for both homogeneous graph and heterogeneous graph (metapath-based in this case).

erickim555 commented 3 years ago

Is there any new ongoing work to support distributed random walks on DistGraph?

I'm working on a project where our graph is too large to fit on a single machine, hence the need for DistGraph, but we'd like to use random-walk neighbors for GraphSAGE rather than N-hop neighbors (which is the current distributed GraphSAGE approach).

erickim555 commented 3 years ago

Related question: would it be straightforward to implement a distributed random walk using DGL DistGraph python API functions?

I'm not super familiar with the DGL API yet, but I can see a solution where we re-implement rand_walk() in terms of the distributed API functions (like via a distributed get_neighbors()) rather than local graph API.

VoVAllen commented 3 years ago

@erickim555 It's inefficient for random walk to go across different machine. Possibly we will support it by only random walk on the separate partitions, does this meet your requirement? I'm also wondering how big is your graph? Did you meet any performance issue with single machine?

erickim555 commented 3 years ago

@VoVAllen I see, thanks for the explanation! Random walks limited to a machine's separate partition is acceptable.

My graph is on the order of ~4B nodes, 1B-10B edges. To my understanding, the graph structure itself is large enough that it doesn't seem possible to fit it into a single machine (even on a memory-optimized AWS instance with ~1 TB memory). What do you think?

jermainewang commented 3 years ago

As there has been an increasing interest in this, I wonder what are the available options out there? A brief googling points me to https://github.com/KnightKingWalk/KnightKing, which is a SOSP'19 paper.