Hello οΌI take the liberty of disturbing youοΌI have some questions about the experiment mentioned in the paperοΌas followοΌ
As mentioned in the article, the following formula is used in the final embedding update process:
πβπ:=(1 βno_grad(πβ(π)) + πβ(π))Β· πβ
However, I achieved better results simply by using πβπ:= πβ(π)Β· πβ. according to the explanation given in the article, my understanding is that the Straight-Through method can save calculation cost. Besides, are there any other additional advantages? Or maybe my understanding is wrong, please tell me your original consideration.
The layer parameter given in the experimental configuration of experiment UMLS is 5, but under this experimental setting, I cannot reproduce the corresponding result in the paper, but when the layer parameter is set to 8, I can achieve slightly higher experimental results than those in the paper.
The straight-through method is to back-propagate through the sampling signal without changing the values of entities' representations ($h_e$). You are recommended to try other kinds of calculation (e.g., $h_e = p(e) \cdot h_e$) that might be empirically better.
We provide two kinds of reproduction scripts, you can choose any of them. Note that there is randomness in training from scratch. You can try more train trials, or new hyper-parameters, as you have tried, which can lead to an even better result. :)
Hello οΌI take the liberty of disturbing youοΌI have some questions about the experiment mentioned in the paperοΌas followοΌ