EdisonLeeeee / MaskGAE

[KDD 2023] What’s Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders
https://arxiv.org/abs/2205.10053
76 stars 6 forks source link

About Code #3

Closed DuanhaoranCC closed 2 years ago

DuanhaoranCC commented 2 years ago

Hello, and thanks for your contribution. I need to seek guidance on a few questions.

deg = degree(row, num_nodes=num_nodes)
rowptr = row.new_zeros(num_nodes + 1)
torch.cumsum(deg, 0, out=rowptr[1:])
n_id, e_id = random_walk(rowptr, col, start, walk_length, 1.0, 1.0)
e_id = e_id[e_id != -1].view(-1)  # filter illegal edges
edge_mask[e_id] = False

where rowptr is degree matrix (N * 1), col is edge_index, powptr and col' dim is not consistent, Why?, What do n_id and e_id represent?

This is official example:

row = torch.tensor([0, 1, 1, 1, 2, 2, 3, 3, 4, 4])
col = torch.tensor([1, 0, 2, 3, 1, 4, 1, 4, 2, 3])
start = torch.tensor([0, 1, 2, 3, 4])

walk = random_walk(row, col, start, walk_length=3)
EdisonLeeeee commented 2 years ago

Hi, thanks for your interest.

where rowptr is degree matrix (N * 1), col is edge_index, powptr and col' dim is not consistent, Why?

rowptr is the index pointer for the sparse matrix (CSR-format), it is not necessary to be consistent with the shape of row or col. Actually, it has a shape of (len(row) +1, )

What do n_id and e_id represent?

n_id is the id of nodes in the sampled random walks while e_id is the edge id of corresponding walks. The official example returns walk, which is exactly n_id in our code. We need the e_id to be returned to mask the edges sampled by random walks correspondingly so we use the implementation of torch_sparse rather than that in PyG directly.

Let me know if you have any further questions :)