Parameters for Les Miserables dataset - Githubissues

aditya-grover / node2vec

http://snap.stanford.edu/node2vec/

MIT License

2.61k stars 912 forks source link

Parameters for Les Miserables dataset #2

Closed mewwts closed 7 years ago

mewwts commented 7 years ago

Hi,

Thanks for node2vec - such an interesting idea.

Could I ask you to specify some additional parameters for the case study 4.1 in you paper so that I can reproduce the community-result?

For the top example you set p=1, q=0.5, but I'm wondering what you specified num_walks, walk_length for the random walk generation, as well as size, window, min_count, sg and iter for Word2Vec.

Hope this isn't too cumbersome to reply to. Thanks again!

aditya-grover commented 7 years ago

Please direct all questions regarding the paper to adityag@cs.stanford.edu. Feel free to open an issue if there is any clarification specific to the node2vec implementation provided in this repository.

mewwts commented 7 years ago

Sure ¯(ツ)/¯

Tixierae commented 6 years ago

@mewwts did you get the answer?

mewwts commented 6 years ago

Hey @Tixierae - I got some parameters from @aditya-grover back then. For word2vec size=8, window=2, sg=1, iter=1. I was however not able to replicate the results.

Tixierae commented 6 years ago

@mewwts many thanks for the quick reply! So they did use a non-default window size (the default is 10). It seems indeed to be a critical tuning parameter that really depends on the graph (e.g. see Figure 2 of Watch your step: Learning graph embeddings through attention - from Google). My guess is that the window size should be to some extent proportional to the size of the graph and to its diameter. It may be harmful to use a window of size 10 if the shortest path between any two nodes in the graph is, say, 3. Do you know by any chance what values of num_walks and walk_length they used?

mewwts commented 6 years ago

Exactly, @Tixierae! Thanks for linking to that paper, looks like a good read. Printed it now.

I was not able to find the values of those parameters sadly. The email I got from @aditya-grover said the random-walk parameters were set to "very low values" due to network size being small.

Tixierae commented 6 years ago

thanks @mewwts ! @aditya-grover What would you recommend for num_walks, walk_length and window when the graph is small/very dense? Any rule of thumb to set window size based on graph density/diameter? PS: I know it may not be the best place to ask, but some quick feedback would be very much welcome and would benefit more people than tru private messaging. Thanks much in advance!

mewwts commented 6 years ago

@Tixierae I think the best thing you can do for now is try to grid search these parameters. The network is quite small right?

Tixierae commented 6 years ago

@mewwts yes, each network is small, but I have thousands of them, for several datasets. The final task is graph classification, for which I am 10-fold cross validating a 2D CNN, with many epochs for each fold (I'm using this approach). So, I can do a coarse grid search, but each combination of parameters is quite costly to test. Hence, getting good priors would help a lot.

Tixierae commented 6 years ago

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

mewwts commented 6 years ago

Thanks @Tixierae - interesting!

annaguldberg commented 4 years ago

Hi, I have a network of 311 nodes. It is quite dense with an average shortest path of 2. I have used p=1, q=2 and kept the window size and walk length very small, but are not getting great results. Does anyone have any suggestions to what could be wrong? G311 npg

bianxintong commented 3 years ago

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

I was having a hard time replicating the homophily result (structural equivalence was somehow easier to replicate, idk why), thanks to this study, i was finally able to go from this:

to: if I resize the node by node degree, I obtain as far the best approximation of the image in the paper that i can get:

I guess when the graph is so small, we need to repeat the walk many times to make word2vec actually learn something; and since the window size so small, we need to walk a long way the get the surrounding community structure. And, the window size is definitely important.

sarmad-MOAHAMMED commented 3 years ago

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

Hi, Could you share the code for this project ?

Thanks.

bianxintong commented 3 years ago

@mewwts section 8 of this paper: http://projekter.aau.dk/projekter/files/259997796/mi109f17___Vertex_Similarity.pdf

Hi, Could you share the code for this project ?

Thanks.

Edited on 24-03-2021: first I compiled the node2vec bin, then did: !./node2vec -i:lesmisDir.edgelist -o:lesmisDir.emb -d:16 -l:8 -r:100 -k:2 -p:1 -q:0.5 -e:1 then I did a 5 cluster kmeans clustering then export the result to gephi for graphing.

I found the node2vec bin worked better than open source implementation (stellargraph in this case)

I stumbled upon my notes of replicating the results today, so I modified this comment. I was frustrated by the amount of effort to replicate the result to be honest that was why I didn't document well my process. But I think that's more like a problem of node2vec itself, that the hyperparameters are really sensitive and really depends on your graph.