xPos is an improved version of the original RoPE from the RoFormer paper (i.e. a modification of ggml_rope with !is_neox flag). I'm unaware of published models using it yet, but it is important because it's the positional embedding employed by the RetNet paper (which offers support for O(1) inference, i.e. independent from context length, with non-degraded quality; a potential superior replacement for attention with kv caches).
xPos is an improved version of the original RoPE from the RoFormer paper (i.e. a modification of ggml_rope with !is_neox flag). I'm unaware of published models using it yet, but it is important because it's the positional embedding employed by the RetNet paper (which offers support for O(1) inference, i.e. independent from context length, with non-degraded quality; a potential superior replacement for attention with kv caches).
For a quick Python comparison of the original RoPE, GPT-NeoX RoPE and xPos RoPE see: https://github.com/jploski/RotaryEmbedding
Based on the above I will create a pull request with an initial (non-CUDA) implementation.