aditya-grover / node2vec

http://snap.stanford.edu/node2vec/
MIT License
2.61k stars 912 forks source link

Consuming too much memory #10

Open Soumyajit opened 7 years ago

Soumyajit commented 7 years ago

I have a small graph of about 100MB (edge-list file size). #nodes = 65k, #edges = 3.5m. Node2vec just does not run on this graph. I have tracked the problem to being in the preprocess_transition_probs() function. This function gradually eats my whole system memory within 30minutes and everything hangs. I don't even reach the word2vec part after random walks.

I am running experiments on a i7, 16GB laptop. Deepwalk and LINE are able to process upto 1GB graphs. Deepwalk on this 100MB file runs in like 5minutes (including the gensim w2v procedure).

enricopal commented 7 years ago

Hi Soumyajit! I am using node2vec for my research activities and encountered a similar problem... I use a less memory intensive implementation now https://github.com/MultimediaSemantics/entity2vec, by saving the walks to 0a zip file and then reading it through an iterator one line at the time for the word2vec learning part. You still need to go past the preprocess_transition_probs() part though. Hope this can help!

roks commented 7 years ago

Hi,

Have you tried our high performance, multithreaded C++ implementation: https://github.com/snap-stanford/snap/tree/master/examples/node2vec

zhushun0008 commented 6 years ago

Is C++ implementation for node2vec multithreaded?

zhushun0008 commented 6 years ago

@enricopal I tried the entity2vec to generate walks, but it was too slow. Does some parallel version of generating walks exist?

stray-leone commented 5 years ago

@zhushun0008 I tried to use https://github.com/snap-stanford/snap/tree/master/examples/node2vec. this run very quickly. my input is 2,833,276 edges

VVCepheiA commented 5 years ago

@roks The C++ implementation suffers from the same problem (preprocess_transition_probs eats a lot of memory, as described in an issue here). The program got killed by the system on a small graph with 20M edges :(

I think the problem is some of us are trying to run it on the projected/ folded graphs. These graphs have a lot of cliques and the precomputation of transition probability may make the performance close to O(E^2).

roks commented 5 years ago

node2vec requires significant amount of memory for graph of your size. Check out http://snap.stanford.edu/graphsage/ for a less memory demanding solution.