Closed joerunde closed 7 months ago
Implements a new LinearScalingPositionRotaryEmbedding layer that supports linear scaling of position ids when processing embeddings. Without this, models with a linear rope_scaling configuration could load fine but would give garbage output.
Changes made from inspection of Transformer's LlamaLinearScalingRotaryEmbedding implementation. Basically it just means scaling the position ids before the application of cosine or sine.
Implements a new LinearScalingPositionRotaryEmbedding layer that supports linear scaling of position ids when processing embeddings. Without this, models with a linear rope_scaling configuration could load fine but would give garbage output.
Changes made from inspection of Transformer's LlamaLinearScalingRotaryEmbedding implementation. Basically it just means scaling the position ids before the application of cosine or sine.