Closed PJthunder closed 5 years ago
Hi. Implementation of deepwalk is slightly different from the original one, but they share the same high-level design and reach similar results in our experiments.
In graphvite, augmentation_step
corresponds to the window_size
in deepwalk. They may not take the exact same value, since we removed some tricks like random window subsampling from the original deepwalk.
By the way, it's really necessary to tune the augmentation step for each dataset. My heuristic is to use augmentation_step = floor(log(n) / log(|E| / |V|))
, where n
is about 1000.
Thanks. So if my understanding is correct, you do not really run the "random walk" in your deepwalk implementation right? The augmentation steps are more related to the local augmentation in LINE paper?
I am working on some unsupervised task and I don't have labels for each node. How should I tune the parameter?
In addition, could you please give some simple and high-level explanations for each parameters in the config file? Maybe that document exists somewhere, but I fail to find that.
Not really. The only difference is that we don't explicitly store all random walks, for reducing memory cost.
Original DeepWalk
Note the original implementation doesn't repeat step 1, so the samples are not purely stochastic.
GraphVite's DeepWalk
The augmentation step is equivalent to a directed sliding window. That is, only skip grams like (i, i+s)
where 0 < s <= augmentation step
is accepted. Since as_undirected
is true by default, it is equivalent to standard sliding window.
I apologize that we didn't provide a tutorial on the hyperparameters. We are going to compose some in the next version update. You may find the description of hyperparameters here, in GraphSolver.train
.
Here are some basic insights for tuning hyperparameters of node embedding.
augmentation_step
matters much. It highly depends on the sparsity of your graph and the granularity of your downstream task. The granularity means how large is the community of nodes you would like to regard as similar. Higher value should be used for sparser graphs and larger granularity.episode size
and num_negative
determine your training speed. Larger episode size
is faster, but consumes more CPU memory. Smaller num_negative
is faster, but sometimes harms convergence.num_negative * negative_weight
may influence the performance. Our default settings is found to be robust on sparse social networks. I am not sure for others.num_epoch
and lr
is good for a reasonable result. For dense datasets, you can turn down the num_epoch
.If you don't have labels, a workaround is to remove a few edges to form a validation set. Tune hyperparameters according to the link prediction performance on the validation set. Once you get the best setting, you can train on the original whole graph and deploy it for your downstream task. This assumes that the granularity of your downstream tasks is not far away from that of link prediction.
Thanks for you explanations. It makes the setting clear. So for the difference between your deepwalk and the original ones. Actually you will generate more "random walk" in the augmentation step than using the original implementation?
For the parameter setting, what is "random_walk_batch_size"? Will it influence the convergence rate or final embedding results?
For the evaluation, you said the graphvite has a link prediction as the evaluation task. How should that work? Also, what's the classifier you used in the link prediction?
Thank you for your help!
Yes. We generate somehow more, but consume less memory.
random_walk_batch_size
determines after generating how many random walks we should generate their skip gram samples. It's the batch size for CPU sampling, not for GPU training. In other words, it doesn't influence convergence. The best value may vary according to CPU cache size, but generally the default value works fine.
The interface of link prediction can be found here. Our link prediction works as follows
Ideally, all positive samples should be ranked before negative samples. There isn't any classifier, since we found that using the dot product in DeepWalk / LINE / node2vec is more robust to dimension changes.
Hi. Implementation of deepwalk is slightly different from the original one, but they share the same high-level design and reach similar results in our experiments.
In graphvite,
augmentation_step
corresponds to thewindow_size
in deepwalk. They may not take the exact same value, since we removed some tricks like random window subsampling from the original deepwalk.By the way, it's really necessary to tune the augmentation step for each dataset. My heuristic is to use
augmentation_step = floor(log(n) / log(|E| / |V|))
, wheren
is about 1000.
@KiddoZhu Hi, I have some questoins about the default settings. Dose the default value "auto" for augmentation_step
means that augmentation_step = floor(log(1000) / log(|E| / |V|))
? If so, my dataset has |E|=116471433
and |V|=3299324
, which gives augmentation_step=1, and it seems unreasonable. Shoud I avoid to use the default value and set augmentation_step
to a larger number? Thx.
I have used the default parameters of deepwalk to train on a node embedding as well as the config file in a yaml file. The embedding results look different in one of my downstream task. Is that possible to make the two implementations (deepwalk & graphvite) to be consistent? Since I do not see some parameters like "window-size" in the yaml file, I don't know whether they are still parameter options for graphvite.