what is "embed_manifold" used for?

daidaren202 commented 4 years ago

Hi, thanks for your great work. I was wondering what is hyper-parameter 'embed_manifold' used for? Looking for your reply.

leuchine commented 4 years ago

Hi, Thanks for your interest. embed_manifold is used for selecting whether the input embeddings are in the Euclidean space or the hyperbolic space. If it is in the Euclidean space, we need to first apply exp map to transform it into the hyperbolic space, before applying HGNN. If it is in the hyperbolic space, such operation is not necessary, but Riemannian optimizers like Riemannian ADAM will be used to update these embeddings. Thanks!

Best Regards, Qi

daidaren202 commented 4 years ago

Thank you for your careful answer. I'm a little confused about the latter case. If the input embeddings are in the hyperbolic space, doesn't HGNN guarantee that the middle and final embeddings are still in hyperbolic space? Why do we need RADAM to update these embeddings?

Best Regards.

leuchine commented 4 years ago

The simple gradient descent for Euclidean space is SGD, which requires multiplication and addition operations. These operations have different forms in the hyperbolic space. HGNN can only guarantee what you said in the forward pass. If you directly use these Euclidean operations, the backward pass may not guarantee that (e.g. the embeddings may cross the Poincare ball boundary). More details can be found at https://openreview.net/forum?id=r1eiqi09K7. Thanks!

Best Regards, Qi

daidaren202 commented 4 years ago

Sorry to reply you so late. I thought optimizers (SGD or RSGD, etc) are used to optimize the ''parameters''. For example, given x, a model outputs y=ReLU(Wx) and optimizers are used to update "W". If input embedding x is in hyperbolic, the output embedding y is gotten by calculating. Why do we need RADAM to update these embeddings which is gotten by calculating? As for what you said " ... but Riemannian optimizers like Riemannian ADAM will be used to update these embeddings." I thought the exp mapping ensures the embedding in the hyperbolic space. Is there anything wrong with my understanding? If so, I hope you can point it out. Thanks again for your patient answer.

leuchine commented 4 years ago

Sorry. I don't know whether I understand your question or not. Here the parameters refer to the embeddings. exp map and HGNN can ensure that representations are in the hyperbolic space during the forward computation. But during backpropagation, Euclidean optimizers cannot guarantee this property. An extreme example is that the updated parameters will not in the hyperbolic space, when you set the learning rate alpha to a very large value (e.g. 1e6). So Euclidean parameters and hyperbolic parameters (which is initialized as in the hyperbolic space and does not need exp map to transform these parameters to the hyperbolic space) require different optimizers. Thanks!

Best Regards, Qi

leuchine commented 4 years ago

Embedding is the input to HGNN. In Eq. (2), it refers to h^0, i.e. the initial hidden states. W^k is a Euclidean parameter matrix.

You can reach me by email (qi.liu@cs.ox.ac.uk) if you prefer that. Thanks!

Best Regards, Qi

daidaren202 commented 4 years ago

Thank you very much for your detailed answers to my questions!

daidaren202 commented 4 years ago

Hi, I encountered a new problem. For Lorentz model, the distance is defined as d(x,y)=arcosh(-<x,y>) as Eq.10 . It is known that the definition domain of arcosh() is [1,+\infty). But for any points x,y, it cannot ensure -<x,y> \in [1,+\infty). In this case, I got a 'nan' while calculating the distance. Is there something wrong I understand?

Regards.

leuchine commented 4 years ago

Hi,

Yes. Using points outside the definition domain will cause NAN errors. You can try the normalize function in https://github.com/facebookresearch/hgnn/blob/master/manifold/LorentzManifold.py to pull any point back to the Lorentz manifold to ensure that the points reside in the definition domain. Thanks！

daidaren202 commented 4 years ago

Thanks for your reply first.

The normalize function you provided seems to pull the point outside the Lorentz manifold to the Lorentz manifold. The problem I encountered is that even for points x,y lying on the Lorentz manifold, but it still cannot ensure that <x,y>_{L} is in the definition domain of arcosh().

I am confused why cannot calculate the distance between two points on the Lorentz model. Thanks again.

Regards.

leuchine commented 4 years ago

Hi,

Sorry for the confusion. Theoretically, it is impossible that <x,y>{L} is not in the definition domain, since -<x,y>{L} >= 1. But due to numerical stability issues, sometimes -<x,y>_{L} > can be a value very close to 1 like 0.999. Could you please check whether this is the case? If such cases happen, you can use torch.clamp to make it >=1. Thanks!

daidaren202 commented 4 years ago

Thanks first. For x,y \in L, there is -<x,x>{L}=1, and -<y,y>{L}=1. Why is there -<x,y>{L} >= 1. I try to use "a^2 + b^2 >= 2ab" to prove it but failed. It seems a simple question but does confuse me. Thanks.

Regards.

leuchine commented 4 years ago

Hi,

I forgot where I found the proof. But you can easily check when there is only (x_0, x_1) and (y_0, y_1). x_0 = sqrt(1 + x_1^2) in this case. Then, the lorentian inner product is less than or equal to 1 as shown in the Figure https://www.wolframalpha.com/input/?i=-sqrt%281+%2B+x%5E2%29+sqrt%281+%2B+y%5E2%29+%2B+xy. More sophisticated ones shall be proven using derivatives. Thanks!

daidaren202 commented 4 years ago

Oh! Thanks a lot. It did help me.

daidaren202 commented 4 years ago

Sorry to bother you again. In Table.1, you reported F1 (macro) score on synthetic data. But there is only accuracy in the code you released . Is there any additional step to calculate F1 (macro) score and it is not in this code? Thanks!

leuchine commented 4 years ago

Yes. I used https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html to calculate the F1 score. Shall be easy to add this line. Thanks!

facebookresearch / hgnn

what is "embed_manifold" used for? #16