Some confusion about the experiment

MavenZheng1003 commented 5 months ago

Hello ！I take the liberty of disturbing you！I have some questions about the experiment mentioned in the paper，as follow：

As mentioned in the article, the following formula is used in the final embedding update process: 𝒉ℓ𝑒:=(1 −no_grad(𝑝ℓ(𝑒)) + 𝑝ℓ(𝑒))· 𝒉ℓ However, I achieved better results simply by using 𝒉ℓ𝑒:= 𝑝ℓ(𝑒)· 𝒉ℓ. according to the explanation given in the article, my understanding is that the Straight-Through method can save calculation cost. Besides, are there any other additional advantages? Or maybe my understanding is wrong, please tell me your original consideration.
The layer parameter given in the experimental configuration of experiment UMLS is 5, but under this experimental setting, I cannot reproduce the corresponding result in the paper, but when the layer parameter is set to 8, I can achieve slightly higher experimental results than those in the paper.

AndrewZhou924 commented 5 months ago

Hi, thanks for your interest in our work.

The straight-through method is to back-propagate through the sampling signal without changing the values of entities' representations ($h_e$). You are recommended to try other kinds of calculation (e.g., $h_e = p(e) \cdot h_e$) that might be empirically better.
We provide two kinds of reproduction scripts, you can choose any of them. Note that there is randomness in training from scratch. You can try more train trials, or new hyper-parameters, as you have tried, which can lead to an even better result. :)

MavenZheng1003 commented 5 months ago

Thank you for your patience and understanding.

AndrewZhou924 commented 5 months ago

You're welcome. :)

LARS-research / AdaProp