Closed csjackson0 closed 9 months ago
This PR is a follow up to #34 and integrates Linear and DynamicNTK Scaling Rotary position encoding.
Created test tensors using torch.ones(1,12,10,64) tensor as input into the original implementations at https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L128
torch.ones(1,12,10,64)
This PR is a follow up to #34 and integrates Linear and DynamicNTK Scaling Rotary position encoding.
Created test tensors using
torch.ones(1,12,10,64)
tensor as input into the original implementations at https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L128