Closed racheandre closed 1 year ago
Hi, the model is fine, koboldcpp just has had some changes to the way RoPE works.
For this model, you will now need to use linear rope with --ropeconfig
instead. Thus, please run with the command:
python koboldcpp.py --stream --model longchat-13b-16k.ggmlv3.q3_K_S.bin --useclblast 0 0 --contextsize 8192 --gpulayers 43 --threads 4 --ropeconfig 0.25 10000
Please let me know if this works for you.
It works, Thank you! Seems like I missed another big thing without touch LLM for a month. Does the dynamic RoPE settings can be applied to the new LLAMA2 model with 4k context size?
When using Longchat model with latest version
94e0a06daf9605b3fe23252bf3e4aa123f6027f4
, it will generate gibberish. I find that using the versione1a7042943a0016dc979554372641112150dc346
at 2nd July worked, though it may not be the version that has the breaking changes, since I don't have time to find it.Chat Result
Environment
python koboldcpp.py --stream --model longchat-13b-16k.ggmlv3.q3_K_S.bin --useclblast 0 0 --contextsize 8192 --gpulayers 43 --threads 4
TheBloke/LongChat-13B-GGML