VHRanger / CSRGraph

A tiny library for large graphs
MIT License
111 stars 17 forks source link

Faster sampling with Vose's alias method #5

Open MaxHalford opened 3 years ago

MaxHalford commented 3 years ago

Hey!

This is not an issue but rather a suggestion :).

From what I understand, you walk on a graph by randomly hopping from one node to another. You do this via a roulette wheel method that runs in a n + log(n) time: n for building the wheel and log(n) for the subsequent binary search. There's an O(1) way to do via Vose's alias method (this is definitely worth a read). I implemented this in Cython here. Naturally this costs a bit of memory, but it might be worth it, I don't know.

Just wanted to make sure you knew about this!

PS: great article on challenging GNNs, that's how I found you :)

VHRanger commented 3 years ago

Hey Max,

Thanks for the comment!

In our case it's n + log(n) but n is the number of edges for a node. So unless we're working on graphs with a huge average degree, considerations like cache locality and efficient implementation matters most for performance (since n < 50 in most cases we see).

That said, I'll keep the issue open so I can look into benchmarking your proposed method against my "naive" method

VHRanger commented 3 years ago

Just to track improvements to the random walk sampling in this issue, it should be possible to accelerate the node2vec walks with this:

https://louisabraham.github.io/articles/node2vec-sampling.html