Regenerating context/chat every 3-4 replies with SillyTavern & API

I'm using the API in sillytavern. I have a reverse proxy setup through nginx for the outside world.

When using this setup every 2-3 replies requires reprocessing what appears to be the entire story which can take a while.

This wasn't happening with sillytavern when using koboldcpp on LAN.

I also enabled the multiuser interface. I'm going to assume the issue is with the multiuser interface.

Full command:

./koboldcpp.py --model /Volumes/fastarray/models/mixtral/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --launch --noblas --gpulayers 100 --contextsize 6144 --threads 11 --usemlock --quiet --password passwordhere --port porthere --host 127.0.0.1 --multiuser 3

LostRuins / koboldcpp

Regenerating context/chat every 3-4 replies with SillyTavern & API #756