Describe the Issue
llama.cpp exposes the options --grp-attn-n and --grp-attn-w for the Group size and Neighbor window size hyper parameters from the SelfExtend paper.
Without those parameters we're unable to extend the context size of models with short context sizes, such as Gemma2, without expensive model fine tuning.
Please consider exposing those options via the koboldcpp front-end as well.
You can extend the context size with --contextsize [desired max context length] which handles context size scaling automatically with Gradient Rope scaling.
Describe the Issue llama.cpp exposes the options
--grp-attn-n
and--grp-attn-w
for the Group size and Neighbor window size hyper parameters from the SelfExtend paper.Without those parameters we're unable to extend the context size of models with short context sizes, such as Gemma2, without expensive model fine tuning.
Please consider exposing those options via the koboldcpp front-end as well.
Additional Information: N/A