Questions About the Node Dropping Implementation

Somedaywilldo commented 3 years ago

Dear Authors,

Thanks for the code, in your paper, node dropping is "Given the graph G, node dropping will randomly discard certain portion of vertices along with their connections".

But in your current implementation, in unsupervised_TU/aug.py, the drop_nodes method only cut-off the edges, but still keeps the nodes, this would create isolated nodes, wouldn't that be a problem? I think if you want to remove a node, maybe you should re-index the nodes and the edges, otherwise this augmentation is some kind of "edge dropping", right?

def drop_nodes(data):
    node_num, _ = data.x.size()
    _, edge_num = data.edge_index.size()
    drop_num = int(node_num / 10)

    idx_drop = np.random.choice(node_num, drop_num, replace=False)
    idx_nondrop = [n for n in range(node_num) if not n in idx_drop]
    idx_dict = {idx_nondrop[n]:n for n in list(range(node_num - drop_num))}

    edge_index = data.edge_index.numpy()
    adj = torch.zeros((node_num, node_num))
    adj[edge_index[0], edge_index[1]] = 1
    adj[idx_drop, :] = 0
    adj[:, idx_drop] = 0
    edge_index = adj.nonzero().t()

    data.edge_index = edge_index
    return data

yyou1996 commented 3 years ago

Hello @Somedaywilldo,

Thanks for your interest. In unsupervised_TU, the removal of isolated nodes is performed by batch in 211-225 lines in gsimclr.py and complete the procedure you mentioned. You can print the data statistic to verify the implementation in each batch.

A more straightforward implementation is in semisupervised_TU that removing nodes is performed during augmentation. I think this two should output the same thing for each batch, though I also prefer the latter one.

Somedaywilldo commented 3 years ago

Thanks for your reply! I just found that the augmentation implementation is different.

Shen-Lab / GraphCL

Questions About the Node Dropping Implementation #6