Open KizzyCode opened 6 months ago
Are you using streaming?
Positive, I'm using streaming. Also, in like 1 out of 4 cases it seems to abort correctly, so I'm a bit puzzled... This is my koboldcpp config: miqu-1-70b.q5_K_M.kcpps.zip. I don't know of a good way to export SillyTavern's or LibreChat's config, but both are pretty vanilla.
What version of sillytavern and koboldcpp are you using? Did you select the "koboldCpp" option under text-completions endpoints?
Api-type is Text Completion/KoboldCpp:
Hmm that is odd then. I'm not very sure, but i'll look into it. Did you try see if it mainline koboldcpp works compared to the rocm fork?
Nope, I didn't; but I can try later, just to be sure it's not related to the rocm patches. Will give an update :)
Happens too with the current vanilla release.
If I use koboldcpp's OpenAI API via SillyTavern or LibreChat, and then cancel the request via the stop buttons, more often than not, koboldcpp happily keeps generating new tokens until either the token limit is reached or it comes to a conclusion.
I'm not 100% sure if that's a problem with koboldcpp; but since both frontends seem to work with other backends and fail with koboldcpp, I'd guess it is the outlier here.
Steps to reproduce: