dalab / hyperbolic_nn

Source code for the paper "Hyperbolic Neural Networks", https://arxiv.org/abs/1805.09112
Apache License 2.0
172 stars 31 forks source link

Hyperbolic MLR function #2

Open dutchJSCOOP opened 4 years ago

dutchJSCOOP commented 4 years ago

Hi! First of all, thank you for this research, it is very fascinating.

I have a question about your implementation of the hyperbolic MLR function. In your implementation, you define it as: 2./ np.sqrt(c) * |A_mlr| * arcsinh(np.sqrt(c) * pxdota * lambda_px)

First, my question regards the l2 normalization of A_mlr in creating pxdota, why do you do this?

Second, seeing how lambda_px = 2. / 1 - c * |minus_p_plus_x|^2, I find it difficult to see how your implementation of the hyperbolic MLR is equivalent to the definition of
P(y=k | x) in your paper:
lambda_p * |A_mlr| / np.sqrt(c) * arcsinh(2*np.sqrt(c) * pxdota /( (1 - c * |minus_p_plus_x|^2)*|A_mlr|)
= 2./(np.sqrt(c)(1 -c* |p|^2)) * |A_mlr| * arcsinh(np.sqrt(c) * pxdota * lambda_px / |A_mlr|)

It seems to me the 1/(1-c * |p|^2) before the asinh term, and the 1/|A_mlr| term in the asinh term are missing, but I can't figure out where they went!

Does it have to do with the fact that the variable A_mlr first needs to be scaled by (lambda_0 / lambda_p) to be able to optimize it as a euclidean parameter?

I am currently writing a paper that makes extensive use of the definitions in your paper, and like to keep the implementation as close as possible to yours.

EDIT: I just realized that the l2_normalization is the implicit 1/|A_mlr|. This just leaves the 1/(1-c*|p|^2) that is missing.

octavian-ganea commented 4 years ago

Thanks for your nice words.

  1. pxdota is absorbing the a_k normalization inside arcsinh of the MLR formula (eq 23 from https://papers.nips.cc/paper/7780-hyperbolic-neural-networks.pdf). So pxdota in our code is actually <-pk + x, ak/||ak||>.
  2. You are right, A_mlr is actually a'_k as denoted in our paper, which is an Euclidean param. So lambda_pk is absorbed inside ||a'_k||.
dutchJSCOOP commented 4 years ago

Thank you. I have a more theoretical question as well. You derive the hyperbolic MLR from the euclidean: p(y=k|x) = exp(Ax - b) = exp(f(x)). Here, f(x) = Ax -b can just be seen as a fully connected layer/ feed-forward layer (with bias) in the standard euclidean case, without the non-linearity. I expected that the hyperbolic feed-forward layer would then simply be the hyperbolic MLR layer without the exponential (and normalization) and with the hyperbolic non-linearity. You define a feed-forward layer as exp_0(f(log_0(x)), thus performing the matrix multiplication in the tangent space at 0 and then mapping back to the hyperbolic space. Are these two notions equivalent? Why can you not just formulate the MLR as p(y=k|x) = exp(mobius_add(mobius_mult(x,A),b)))? This is a bit outside the scope of a Github issue, so if you prefer I can shoot you an email.

octavian-ganea commented 4 years ago

Good question. First, our hyp MLR goes from hyperbolic space to Euclidean space; so if you want to use that and go back to hyperbolic space, you would need to do an additional exp_0. But I agree, you could do it in both ways. We choose the exp_0(f(log_0(x)) because, in this way, we recover the scalar-vector mobius multiplication when the matrix is scalar times identity, and we have the additional properties we described in the paper (e.g. associativity, orthogonal preservation, etc). You would probably use these properties with the MLR feed-forward layer. However, I agree it is an interesting research direction to understand which of these layers is more powerful. Also, check https://arxiv.org/pdf/1901.06033.pdf who uses in section 3.2 our MLR as the decoder layer in a VAE.

dutchJSCOOP commented 4 years ago

Great. Thanks for your clear and quick responses.