Nothing happens after generation with koboldcpp

pbz134 commented 5 months ago

I modified the port in "ReaperEngine.py" to be able to work with koboldcpp, however nothing happens after the text generation has finished. Video is here: https://github.com/Sebby37/Dead-Internet/assets/53271427/c1376062-f1f9-4e2e-84c9-132ec2b6c0dd

grafik

In the cmd terminal (not the koboldcpp terminal), this pops up after the generation finishes: 127.0.0.1 - - [24/Apr/2024 21:52:37] "GET / HTTP/1.1" 200 - [2024-04-24 21:52:48,432] ERROR in app: Exception on / [GET] Traceback (most recent call last): File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\flask\app.py", line 1473, in wsgi_app response = self.full_dispatch_request() File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\flask\app.py", line 882, in full_dispatch_request rv = self.handle_user_exception(e) File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\flask\app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\flask\app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(*view_args) # type: ignore[no-any-return] File "D:\Dead-Internet\main.py", line 17, in index return engine.get_search(query) File "D:\Dead-Internet\ReaperEngine.py", line 76, in get_search search_page_completion = self.client.chat.completions.create(messages=[ File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_utils_utils.py", line 277, in wrapper return func(args, **kwargs) File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai\resources\chat\completions.py", line 579, in create return self._post( File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 1232, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 921, in request return self._request( File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 997, in _request return self._retry_request( File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 1045, in _retry_request return self._request( File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 997, in _request return self._retry_request( File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 1045, in _retry_request return self._request( File "D:\Programme\Miniconda3\envs\DeadInternet\lib\site-packages\openai_base_client.py", line 1012, in _request raise self._make_status_error_from_response(err.response) from None openai.InternalServerError: Error code: 503 - {'detail': {'msg': 'Server is busy; please try again later.', 'type': 'service_unavailable'}} 127.0.0.1 - - [24/Apr/2024 21:52:48] "GET /?query=Cake HTTP/1.1" 500 - 127.0.0.1 - - [24/Apr/2024 21:58:21] "GET /favicon.ico HTTP/1.1" 200 - 127.0.0.1 - - [24/Apr/2024 22:04:56] "GET /?query=Cake HTTP/1.1" 200 - 127.0.0.1 - - [24/Apr/2024 22:05:11] "GET / HTTP/1.1" 200 - 127.0.0.1 - - [24/Apr/2024 22:05:11] "GET /favicon.ico HTTP/1.1" 200 - 127.0.0.1 - - [24/Apr/2024 22:12:09] "GET /?query=Cake HTTP/1.1" 200 -

Sebby37 commented 5 months ago

I don't personally use Koboldcpp, but it looks like it's not properly handling having multiple requests at once. I'll take a look at properly it once I get on my computer but in the meantime try with the llama.cpp server or maybe with text-generation-webui if you have it installed.

Zetaphor commented 5 months ago

LM Studio is another good option. Internally it's using llama.cpp

Sebby37 commented 5 months ago

I just did a small test run with koboldcpp and it seems to 503 whenever the max_tokens is set to be greater than the context size of the model, which koboldcpp defaults to 2048. Try increasing the context size or decreasing the max_tokens and see if that fixes it.

pbz134 commented 5 months ago

I just did a small test run with koboldcpp and it seems to 503 whenever the max_tokens is set to be greater than the context size of the model, which koboldcpp defaults to 2048. Try increasing the context size or decreasing the max_tokens and see if that fixes it.

Hello, I used llama-3-8b, which has a default context size of 8k tokens. This shouldn't give you the 503 error. https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct-GGUF/tree/main

henk717 commented 4 months ago

KoboldCpp should be able to handle multiple requests just fine, but make sure that multiuser is enabled. Its a more powerful API than the one in LM Studio and should be capable of doing everything the others do.

Sebby37 / Dead-Internet

Nothing happens after generation with koboldcpp #1