Closed SimplyCorbett closed 6 months ago
I'm using the API in sillytavern. I have a reverse proxy setup through nginx for the outside world.
When using this setup every 2-3 replies requires reprocessing what appears to be the entire story which can take a while.
This wasn't happening with sillytavern when using koboldcpp on LAN.
I also enabled the multiuser interface. I'm going to assume the issue is with the multiuser interface.
Full command:
./koboldcpp.py --model /Volumes/fastarray/models/mixtral/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --launch --noblas --gpulayers 100 --contextsize 6144 --threads 11 --usemlock --quiet --password passwordhere --port porthere --host 127.0.0.1 --multiuser 3
I don't think a proxy will affect regeneration behavior. That's entirely based on what's in the context. Did you resolve this issue?
I'm using the API in sillytavern. I have a reverse proxy setup through nginx for the outside world.
When using this setup every 2-3 replies requires reprocessing what appears to be the entire story which can take a while.
This wasn't happening with sillytavern when using koboldcpp on LAN.
I also enabled the multiuser interface. I'm going to assume the issue is with the multiuser interface.
Full command:
./koboldcpp.py --model /Volumes/fastarray/models/mixtral/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --launch --noblas --gpulayers 100 --contextsize 6144 --threads 11 --usemlock --quiet --password passwordhere --port porthere --host 127.0.0.1 --multiuser 3