LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.97k stars 349 forks source link

Request cancellation via OpenAI API does not seem to work #745

Open KizzyCode opened 6 months ago

KizzyCode commented 6 months ago

If I use koboldcpp's OpenAI API via SillyTavern or LibreChat, and then cancel the request via the stop buttons, more often than not, koboldcpp happily keeps generating new tokens until either the token limit is reached or it comes to a conclusion.

I'm not 100% sure if that's a problem with koboldcpp; but since both frontends seem to work with other backends and fail with koboldcpp, I'd guess it is the outlier here.

Steps to reproduce:

  1. Setup SillyTavern or LibreChat
  2. Connect it to koboldcpp via the OpenAI v1 API
  3. Try to abort a request
LostRuins commented 6 months ago

Are you using streaming?

KizzyCode commented 6 months ago

Positive, I'm using streaming. Also, in like 1 out of 4 cases it seems to abort correctly, so I'm a bit puzzled... This is my koboldcpp config: miqu-1-70b.q5_K_M.kcpps.zip. I don't know of a good way to export SillyTavern's or LibreChat's config, but both are pretty vanilla.

LostRuins commented 6 months ago

What version of sillytavern and koboldcpp are you using? Did you select the "koboldCpp" option under text-completions endpoints?

KizzyCode commented 6 months ago

Api-type is Text Completion/KoboldCpp: Screenshot 2024-03-14 at 15 15 13

LostRuins commented 6 months ago

Hmm that is odd then. I'm not very sure, but i'll look into it. Did you try see if it mainline koboldcpp works compared to the rocm fork?

KizzyCode commented 6 months ago

Nope, I didn't; but I can try later, just to be sure it's not related to the rocm patches. Will give an update :)

KizzyCode commented 6 months ago

Happens too with the current vanilla release.