[Question] RoPE的实现和论文里不一致 - Githubissues

baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

https://huggingface.co/baichuan-inc/baichuan-7B

Apache License 2.0

5.67k stars 504 forks source link

[Question] RoPE的实现和论文里不一致 #136

Open zehmaaa opened 1 year ago

zehmaaa commented 1 year ago

Required prerequisites

[X] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
[X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[X] Consider asking first in a Discussion.

Questions

请问这里的实现为啥和论文里面不一样？

def rotate_half(x):
    """Rotates half the hidden dims of the input."""
    x1 = x[..., : x.shape[-1] // 2]
    x2 = x[..., x.shape[-1] // 2:]
    return torch.cat((-x2, x1), dim=-1)

论文里的计算是

按照这种实现最后的计算结果会是

我看huggingface里面也是这样，好奇为啥选择这种实现？

Checklist

[X] I have provided all relevant and necessary information above.
[X] I have chosen a suitable title for this issue.

xinge333 commented 4 months ago

embedding 里面神经元的位置是没有顺序的，随便选一半做反转就行了；