I've tried a couple of 13B models, loaded in 4bit (vicuna, gptxalpaca and supercot). After about 5 or 6 creative and fine messages I start to get repetitions of the last inference, independently of the last message sent.
Backend: KoboldAI (Occam's 4bit fork)
Frontend: SillyTavern
I've tried a couple of 13B models, loaded in 4bit (vicuna, gptxalpaca and supercot). After about 5 or 6 creative and fine messages I start to get repetitions of the last inference, independently of the last message sent. Backend: KoboldAI (Occam's 4bit fork) Frontend: SillyTavern