NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Edward Ma | Random Walk in Node Embeddings (DeepWalk, node2vec, LINE, and GraphSAGE). #47

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Edward Ma. Random Walk in Node Embeddings (DeepWalk, node2vec, LINE, and GraphSAGE).

NorbertZheng commented 1 year ago

Overview

Instead of using traditional machine learning classification tasks, we can consider using graph neural network (GNN) to perform node classification problems.

The usefulness of graph properties assumes that individual nodes are correlated with other similar nodes.

Typically example is a social media network. Imagine how Facebook connects you and somebody else based on what post you like, where you check-in etc. A graph is capable to represent this kind of relationship and we can leverage it to train GNN. Detail use cases of GNN will be covered in later stories.

We went through a knowledge graph embedding in the last story. It covers a framework to train graph embeddings via TransE, and ComplEx on a large-scale. In this story, we would like to talk about graph structure and random walk-based models for learning graph embeddings. The following sections cover DeepWalk (Perozzi et al., 2014), node2vec (Grover and Leskovec, 2016), LINE (Tang et al., 2015) and GraphSAGE (Hamilton et al., 2018)

NorbertZheng commented 1 year ago

Graph Structure

Before ego through those random walk-based model, we need to understand some basic graph structures.

NorbertZheng commented 1 year ago

DeepWalk

DeepWalk (Perozzi et al., 2014) is introduced to learn node embeddings via a random walk and word2vec (Mikolo et al., 2013) word2vec algorithm. In natural language processing (NLP), we can apply the skip-gram model and feed a sentence (a series of words) to train word embeddings. In short, the training objective is using the center word to predict surrounding words. You may check this story for detail.

Go back to DeepWalk, it picks a node randomly and "walk" to a neighbor node randomly until it reaches maximum length (or some random length). Given the following knowledge graph (upper part), we pick "A" and "F" as starting points respectively. Later on, we move the node "A" to node "C" and so on.

image

Above: Knowledge Graph. Below: Random Walk Sequence

In this case, we treat node and series of nodes as "word" and "sentence" respectively. Later on, we can reuse the skip-gram algorithm to convergence node embeddings.

NorbertZheng commented 1 year ago

node2vec

node2vec (Grover and Leskovec, 2015) is an advanced version of DeepWalk (Perozzi et al., 2014). One limitations of DeepWalk (Perozzi et al., 2014) is that you cannot control the path. node2vec ((Grover and Leskovec, 2015) also use random behavior but having weight on it.

Instead of "walk" randomly, authors introduce Breadth-First-Sampling (BFS) and Depth-First-Sampling (DFS) to control the random behavior. BFS reaches immediate neighbors while DFS prefer node away from the source.

image

BFS: Breadth-First-Sampling. DFS: Depth-First-Sampling (Grover and Leskovec, 2016)

To provide this flexibility, random walk probability is no longer unweighted while it can achieve by the following step. $d_{tx}$ means the distance from the source while $p$ and $q$ are parameters. A larger $p$ ensures that it is less chance to sample a visited node. $q$ allows us to control whether we want BFS or DFS. A larger $q$ tends to make the random walk around the center.

$$ \alpha{pq}(t,x)= \begin{cases} \frac{1}{p}& if\ d{tx}=0,\ 1& if\ d{tx}=1,\ \frac{1}{q}& if\ d{tx}=2, \end{cases} $$

NorbertZheng commented 1 year ago

LINE

LINE (Tang et al., 2015), aka Line Large-scale Information Network Embedding, proposes to use first-order proximity and second-order proximity to learn node embeddings.

To minimize the first-order proximity objective, connected nodes will convergence to each other in embedding space. Note that it is only applicable for an undirected graph.

$$ O{1}=-\sum{(i,j) \in E}w{ij}logp{1}(v{i},v{j}), $$

Same as above, we need to minimize the second-order proximity objective to convergence nodes who share similar neighbor nodes.

$$ O{2}=-\sum{(i,j) \in E}w{ij}logp{2}(v{j}|v{i}), $$

NorbertZheng commented 1 year ago

GraphSAGE

GraphSAGE (Hamilton et al., 2018), a.k.a. Graph SAmple and aggreGatE, is a model that generates node embeddings on the fly. Unlike other models, it does not train specific node embeddings but trains an aggregator. Due to this nature, it can handle unseen nodes.

This approach including two major steps:

The following figure illustrates the step:

image

As mentioned before, it only samples some neighbor nodes for training in every mini-batch. The random walk approach is used and it is unweighted (same as DeepWalk).

But how do we aggregate it? The authors proposed 4 ways which are mean aggregator, GCN aggregator, LSTM aggregator, and pooling aggregator.

NorbertZheng commented 1 year ago

Take Away

NorbertZheng commented 1 year ago

Extension Reading

NorbertZheng commented 1 year ago

Reference