"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang.
MIT License
21
stars
2
forks
source link
The error in the implementiation of the MsPoELlamaRotaryEmbedding #5
The following is your implementation of the MsPoELlamaRotaryEmbedding:
However due to the x`s shape of the [bs, num_attention_heads, seq_len, head_size], does the right implementation is:
, which is also align with the original implementation of the Rope Embedding of LLama.