DeepGraphLearning / graphvite

GraphVite: A General and High-performance Graph Embedding System
https://graphvite.io
Apache License 2.0
1.22k stars 151 forks source link

Deepwalk results consistency #16

Closed PJthunder closed 5 years ago

PJthunder commented 5 years ago

I have used the default parameters of deepwalk to train on a node embedding as well as the config file in a yaml file. The embedding results look different in one of my downstream task. Is that possible to make the two implementations (deepwalk & graphvite) to be consistent? Since I do not see some parameters like "window-size" in the yaml file, I don't know whether they are still parameter options for graphvite.

KiddoZhu commented 5 years ago

Hi. Implementation of deepwalk is slightly different from the original one, but they share the same high-level design and reach similar results in our experiments.

In graphvite, augmentation_step corresponds to the window_size in deepwalk. They may not take the exact same value, since we removed some tricks like random window subsampling from the original deepwalk.

By the way, it's really necessary to tune the augmentation step for each dataset. My heuristic is to use augmentation_step = floor(log(n) / log(|E| / |V|)), where n is about 1000.

PJthunder commented 5 years ago

Thanks. So if my understanding is correct, you do not really run the "random walk" in your deepwalk implementation right? The augmentation steps are more related to the local augmentation in LINE paper?

I am working on some unsupervised task and I don't have labels for each node. How should I tune the parameter?

In addition, could you please give some simple and high-level explanations for each parameters in the config file? Maybe that document exists somewhere, but I fail to find that.

KiddoZhu commented 5 years ago

Not really. The only difference is that we don't explicitly store all random walks, for reducing memory cost.

Original DeepWalk

  1. generate k random walks from each node
  2. train skip grams within a sliding window on all walks
  3. repeat 2 for several times

Note the original implementation doesn't repeat step 1, so the samples are not purely stochastic.

GraphVite's DeepWalk

  1. generate one random walk from a random node
  2. train skip grams within augmentation step on the walk
  3. repeat 1, 2 for many times

The augmentation step is equivalent to a directed sliding window. That is, only skip grams like (i, i+s) where 0 < s <= augmentation step is accepted. Since as_undirected is true by default, it is equivalent to standard sliding window.

KiddoZhu commented 5 years ago

I apologize that we didn't provide a tutorial on the hyperparameters. We are going to compose some in the next version update. You may find the description of hyperparameters here, in GraphSolver.train.

Here are some basic insights for tuning hyperparameters of node embedding.

  1. augmentation_step matters much. It highly depends on the sparsity of your graph and the granularity of your downstream task. The granularity means how large is the community of nodes you would like to regard as similar. Higher value should be used for sparser graphs and larger granularity.
  2. episode size and num_negative determine your training speed. Larger episode size is faster, but consumes more CPU memory. Smaller num_negative is faster, but sometimes harms convergence.
  3. num_negative * negative_weight may influence the performance. Our default settings is found to be robust on sparse social networks. I am not sure for others.
  4. The default num_epoch and lr is good for a reasonable result. For dense datasets, you can turn down the num_epoch.

If you don't have labels, a workaround is to remove a few edges to form a validation set. Tune hyperparameters according to the link prediction performance on the validation set. Once you get the best setting, you can train on the original whole graph and deploy it for your downstream task. This assumes that the granularity of your downstream tasks is not far away from that of link prediction.

PJthunder commented 5 years ago

Thanks for you explanations. It makes the setting clear. So for the difference between your deepwalk and the original ones. Actually you will generate more "random walk" in the augmentation step than using the original implementation?

For the parameter setting, what is "random_walk_batch_size"? Will it influence the convergence rate or final embedding results?

For the evaluation, you said the graphvite has a link prediction as the evaluation task. How should that work? Also, what's the classifier you used in the link prediction?

Thank you for your help!

KiddoZhu commented 5 years ago

Yes. We generate somehow more, but consume less memory.

random_walk_batch_size determines after generating how many random walks we should generate their skip gram samples. It's the batch size for CPU sampling, not for GPU training. In other words, it doesn't influence convergence. The best value may vary according to CPU cache size, but generally the default value works fine.

The interface of link prediction can be found here. Our link prediction works as follows

  1. Infer the probability of each test edge
  2. Sort the test edges in descending order
  3. Compute the AUC metric of this sequence

Ideally, all positive samples should be ranked before negative samples. There isn't any classifier, since we found that using the dot product in DeepWalk / LINE / node2vec is more robust to dimension changes.

iamxpy commented 4 years ago

Hi. Implementation of deepwalk is slightly different from the original one, but they share the same high-level design and reach similar results in our experiments.

In graphvite, augmentation_step corresponds to the window_size in deepwalk. They may not take the exact same value, since we removed some tricks like random window subsampling from the original deepwalk.

By the way, it's really necessary to tune the augmentation step for each dataset. My heuristic is to use augmentation_step = floor(log(n) / log(|E| / |V|)), where n is about 1000.

@KiddoZhu Hi, I have some questoins about the default settings. Dose the default value "auto" for augmentation_step means that augmentation_step = floor(log(1000) / log(|E| / |V|))? If so, my dataset has |E|=116471433 and |V|=3299324, which gives augmentation_step=1, and it seems unreasonable. Shoud I avoid to use the default value and set augmentation_step to a larger number? Thx.