chenweize1998 / fully-hyperbolic-nn

Code for paper Fully Hyperbolic Neural Networks
74 stars 12 forks source link

Unable to reproduce results #7

Closed goru001 closed 1 year ago

goru001 commented 1 year ago

Hi @chenweize1998 - Thanks for doing great work. I was trying to reproduce results for machine translation but for some reason results seem to be way off. It'll be great if you can help me point in the right direction wrt what I might be doing wrong.

  1. I am using Python 3.7.10, Pytorch 1.7.0+cu110, geoopt 0.3.1
  2. When I train for iwslt14 as per instructions given in README here, the model gets stuck at around validation accuracy ~6 and validation perplexity ~880. After 40k steps, I see BLEU score of 0.

Let me know if I should add more details which might help you. I didn't make any changes to the code though.

goru001 commented 1 year ago

Figured out that this commit introduced the bug in linear layer.

kschwethelm commented 1 year ago

Hi @goru001,

when I tried to reproduce the results myself, I noticed different implementations of the Lorentz linear layer between experiments.

In your case, looking at equation 7 of the paper, only the space values should be added for the residual connection.

$\phi(Wo, v, x) = \frac{\lambda \sigma(v^To)}{||Wh(o)+x_s||}(Wh(o)+x_s)$

This means this code block:

https://github.com/chenweize1998/fully-hyperbolic-nn/blob/e88536a786f776b69712bcfca223ba88eb6932b5/mt/onmt/modules/hyper_nets.py#L51-L54

should be changed to:

time = x.narrow(-1, 0, 1).sigmoid() * self.scale.exp() + 1.1
if bias is not None:
    x = x + bias
x_narrow = x.narrow(-1, 1, x.shape[-1] - 1)

This aligns with the implementation in kg:

https://github.com/chenweize1998/fully-hyperbolic-nn/blob/e88536a786f776b69712bcfca223ba88eb6932b5/kg/LorentzModel.py#L12-L15

Could you try using this fix?

Additionally, you can see another inconsistency in this example. In mt the scale parameter gets exponentiated and in kg not.

chenweize1998 commented 1 year ago

@goru001 Hi! Thanks for your interest! The commit you mentioned that I I meant to fix a bug was actually the one that actually introduced a bug! I'm really sorry for that. You can revert to the previous commit and try again. If you encounter some further problems, please continue to let me know.

chenweize1998 commented 1 year ago

@krxxxxxxxanc Thanks for your help in answering the question! The code in mt does have a bug, and should be fixed as you said. I mistakenly introduced the bug in a previous commit. As for the inconsistency in the scale parameter in kg and mt, that's because in kg, we fix the scale parameter, while in mt we set it to be trainable. To ensure that scale is always positive during training, we exponentiate it. The exponentiation is unnecessary when scale is fixed in kg. The code is functionally consistent.

kschwethelm commented 1 year ago

Hi @chenweize1998, firstly, I want to thank you for answering all questions and continuing to maintain the code.

I want to correct myself and say that the linear layer has slightly different implementations optimized for each application. You are correct about the functional consistency.

With your latest fix, you are adding all dimensions of the vector x, while in equation 7 of the paper, you only propose to add space dimensions. While I don't think this will affect performance much, in kg you are following equation 7 and in mt not. Maybe consider changing to the same implementation. Note that in commit f7b47545277ef63adf74e4e79f0e78b0d8f306d3 you fixed kg to equation 7, and in commit 07a3b579f3d3d17bb50bf98d7ae24b92942b4578 you introduced the bug mentioned here. Maybe you wanted to set both to equation 7.

Current situation:

kg follows equation 7: https://github.com/chenweize1998/fully-hyperbolic-nn/blob/9d747ab73a847662b1a16f6f21ec66b509c84610/kg/LorentzModel.py#L12-L15

mt does not follow equation 7: https://github.com/chenweize1998/fully-hyperbolic-nn/blob/9d747ab73a847662b1a16f6f21ec66b509c84610/mt/onmt/modules/hyper_nets.py#L51-L54

chenweize1998 commented 1 year ago

@krxxxxxxxanc Yeah, basically you are correct. I meant to set both to equation 7, but didn't double check and thus caused the bug. Right now I don't have the time to rerun the experiments on the modified code. For the sake of reproducibility, I prefer to leave it unchanged for now. I will add a comment in the code. Thanks for your correction!

goru001 commented 1 year ago

Thanks @chenweize1998 and @krxxxxxxxanc for your prompt responses.