baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[Question] RoPE的实现和论文里不一致 #136

Open zehmaaa opened 11 months ago

zehmaaa commented 11 months ago

Required prerequisites

Questions

请问这里 的实现为啥和论文里面不一样?

def rotate_half(x):
    """Rotates half the hidden dims of the input."""
    x1 = x[..., : x.shape[-1] // 2]
    x2 = x[..., x.shape[-1] // 2:]
    return torch.cat((-x2, x1), dim=-1)

论文里的计算是 image

按照这种实现最后的计算结果会是 image

我看huggingface里面也是这样,好奇为啥选择这种实现?

Checklist

xinge333 commented 2 months ago

embedding 里面神经元的位置是没有顺序的,随便选一半做反转就行了;