Closed Epsilon-Lee closed 6 years ago
Hi, Google's implementation includes this in modalities.py
Thanks for pointing out, I missed that detail! BTW, do you have any insights/intuitions on why scale the embeddings? Many thanks.
I guess that maybe scaling is empirically better than no scaling. It is better to ask the Tensor2tensor developers for an accurate explanation.
Thanks a lot, I will do that.
Do this influence the final performance? I haven't see it in Google's implementation. Many thanks!