AryaAftab / rotary-embedding-tensorflow

Implementation of Rotary Embeddings, from the Roformer paper, in Tensorflow
MIT License
10 stars 2 forks source link

Problem with the correctness of the embeddings #1

Open rakadam opened 1 year ago

rakadam commented 1 year ago

I have tested the code, and in the case where dim equals to the full dim of the feature vector (so every element will be multiplied with the embedding), I have computed the res2 = np.einsum("axc,ayc->xy", res, res) where the "res" is the result of the embedding. (batch x time x feature, batch was equal to 1) This matrix should show that rotary embedding is relative in a sense, so res2[0, 1] == res2[1, 2] == res2[2, 3] and so on. But your code did not produce this result. I have tried it with other rotary embeddings (from GPTj-6B), and that produced the expected symmetries. I have compared the two codes, and at first glance things have looked very similar, so it is not obvious where is the difference. Maybe I was using your code wrongly? But that would be strange because the embedded vectors look "close" to right.

rakadam commented 1 year ago

I forgot that the input feature vector were pure ones, so the "res" variable contains the pure embedding vectors this way.