Inconsistent with the paper? HypAgg class without attention?

LyndonCKZ commented 4 years ago

Thanks for releasing the detailed code! However, I did not manage to find the attention mechanism mentioned in the paper. I do apologize if I missed something here. Meanwhile, it is hard to reproduce the results with the Hyperboloid model as all configurations listed are for Poincare only. Really appreciate it if the detailed configuration on the Hyperboloid model could be released as that is the major discussion conducted in the paper. Regards,

class HypAgg(Module):
""" Hyperbolic aggregation layer. """

def __init__(self, manifold, c, in_features, dropout):
    super(HypAgg, self).__init__()
    self.manifold = manifold
    self.c = c

    self.in_features = in_features
    self.dropout = dropout

def forward(self, x, adj):
    x_tangent = self.manifold.logmap0(x, c=self.c)
    support_t = torch.spmm(adj, x_tangent)
    output = self.manifold.proj(self.manifold.expmap0(support_t, c=self.c), c=self.c)
    return output

def extra_repr(self):
    return 'c={}'.format(self.c)

cjissmart commented 4 years ago

Yeah, I also notice that it seemly aggregates (sums) the feature vectors of the neighborhood nodes with same weight(1). The reweight way in aggregation operator doesn't belong to GCN or GAT. It really makes me confused.

ines-chami commented 4 years ago

Hi, we added back the hyperbolic attention mechanism (att_0) with an option (--use-att) to use it or not (this might make HGCN a bit slower due to dense matrix multiplications). Aggregation without attention is not using weights 1 since the adjacency matrix is normalized (see data_utils.py).

LyndonCKZ commented 4 years ago

Hi, we added back the hyperbolic attention mechanism (att_0) with an option (--use-att) to use it or not (this might make HGCN a bit slower due to dense matrix multiplications). Aggregation without attention is not using weights 1 since the adjacency matrix is normalized (see data_utils.py).

Thanks. It seems the attention implementation can only be used for tiny datasets so far. The implementation is based on a sigmoid function to reweight the adjacency matrix which is different from the softmax alike attention in the proceedings. Meanwhile, as suggested in the paper, the aggregation is better conducted on the tangent space around each center node. However, the code implementation is all based on the tangent space of the origin.

ines-chami commented 4 years ago

See #18

HazyResearch / hgcn

Inconsistent with the paper? HypAgg class without attention? #14