Closed goru001 closed 1 year ago
Figured out that this commit introduced the bug in linear layer.
Hi @goru001,
when I tried to reproduce the results myself, I noticed different implementations of the Lorentz linear layer between experiments.
In your case, looking at equation 7 of the paper, only the space values should be added for the residual connection.
$\phi(Wo, v, x) = \frac{\lambda \sigma(v^To)}{||Wh(o)+x_s||}(Wh(o)+x_s)$
This means this code block:
should be changed to:
time = x.narrow(-1, 0, 1).sigmoid() * self.scale.exp() + 1.1
if bias is not None:
x = x + bias
x_narrow = x.narrow(-1, 1, x.shape[-1] - 1)
This aligns with the implementation in kg:
Could you try using this fix?
Additionally, you can see another inconsistency in this example. In mt the scale parameter gets exponentiated and in kg not.
@goru001 Hi! Thanks for your interest! The commit you mentioned that I I meant to fix a bug was actually the one that actually introduced a bug! I'm really sorry for that. You can revert to the previous commit and try again. If you encounter some further problems, please continue to let me know.
@krxxxxxxxanc Thanks for your help in answering the question! The code in mt
does have a bug, and should be fixed as you said. I mistakenly introduced the bug in a previous commit. As for the inconsistency in the scale parameter in kg
and mt
, that's because in kg
, we fix the scale
parameter, while in mt
we set it to be trainable. To ensure that scale
is always positive during training, we exponentiate it. The exponentiation is unnecessary when scale
is fixed in kg
. The code is functionally consistent.
Hi @chenweize1998, firstly, I want to thank you for answering all questions and continuing to maintain the code.
I want to correct myself and say that the linear layer has slightly different implementations optimized for each application. You are correct about the functional consistency.
With your latest fix, you are adding all dimensions of the vector x, while in equation 7 of the paper, you only propose to add space dimensions. While I don't think this will affect performance much, in kg you are following equation 7 and in mt not. Maybe consider changing to the same implementation. Note that in commit f7b47545277ef63adf74e4e79f0e78b0d8f306d3 you fixed kg to equation 7, and in commit 07a3b579f3d3d17bb50bf98d7ae24b92942b4578 you introduced the bug mentioned here. Maybe you wanted to set both to equation 7.
Current situation:
kg follows equation 7: https://github.com/chenweize1998/fully-hyperbolic-nn/blob/9d747ab73a847662b1a16f6f21ec66b509c84610/kg/LorentzModel.py#L12-L15
mt does not follow equation 7: https://github.com/chenweize1998/fully-hyperbolic-nn/blob/9d747ab73a847662b1a16f6f21ec66b509c84610/mt/onmt/modules/hyper_nets.py#L51-L54
@krxxxxxxxanc Yeah, basically you are correct. I meant to set both to equation 7, but didn't double check and thus caused the bug. Right now I don't have the time to rerun the experiments on the modified code. For the sake of reproducibility, I prefer to leave it unchanged for now. I will add a comment in the code. Thanks for your correction!
Thanks @chenweize1998 and @krxxxxxxxanc for your prompt responses.
Hi @chenweize1998 - Thanks for doing great work. I was trying to reproduce results for machine translation but for some reason results seem to be way off. It'll be great if you can help me point in the right direction wrt what I might be doing wrong.
Let me know if I should add more details which might help you. I didn't make any changes to the code though.