chengchingwen / Transformers.jl

Julia Implementation of Transformer models
MIT License
523 stars 74 forks source link

possible issues in PositionEmbedding ? #92

Closed Alexander-Barth closed 2 years ago

Alexander-Barth commented 2 years ago

Thank you very much for making your work available as a Julia package! It helps me a lot to learn about Transformers (a topic which I don't know much...) I was comparing:

https://github.com/chengchingwen/Transformers.jl/blob/7a7a25a3e29977678943b71aba0891ea934dcd62/src/basic/embeds/position_embed.jl#L19-L25

to https://machinelearningmastery.com/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1/

It seems to me that we have cos(pos) (for i = 1) in embedding but we do not have its pair sin(pos) as in index i is 1-based. If we would shift i by 1 then we would have all sin/cos pairs.

Of course, this is a very new topic too me and I might be completely wrong here (or it simply does not matter...). I can make a PR this is indeed an issue.

chengchingwen commented 2 years ago

Generally it doesn't matter that much. That function is just to create periodic function as the basis. So as long as those bases have similar attributes, the training behavior would be similar.

OTOH, it is an issue if we want the implementation to be exactly the same as the original paper. The index shift means we are actually using a complete different set of bases for the position encoding. We should change it so that the value could be the same. A PR is appreciated :)