--rope-scaling {none,linear,yarn}
RoPE frequency scaling method, defaults to linear unless speci‐
fied by the model
--rope-scale N
RoPE context scaling factor, expands context by a factor of N
where N is the linear scaling factor used by the fine-tuned
model. Some fine-tuned models have extended the context length
by scaling RoPE. For example, if the original pre-trained model
have a context length (max sequence length) of 4096 (4k) and
the fine-tuned model have 32k. That is a scaling factor of 8,
and should work by setting the above --ctx-size to 32768 (32k)
and --rope-scale to 8.
Version
llamafile-0.8.13
What operating system are you seeing the problem on?
Contact Details
No response
What happened?
C:\llamafile-0.8.13\bin>llamafile.exe -m \models\Karsh-CAI\Qwen2.5-32B-AGI-Q4_K_M-GGUF\qwen2.5-32b-agi-q4_k_m.gguf -ngl 99 -c 65536 --rope-scaling yarn --rope-scale 4 error: unknown argument: --rope-scale
llamafile -h showed they're valid arguments.
Version
llamafile-0.8.13
What operating system are you seeing the problem on?
No response
Relevant log output
No response