Issues with streaming (local LLM)

kirkog86 commented 10 months ago

I use llama.cpp.python[server] running locally with LLM from hugging face + shell_gpt. If I disable the stream in $HOME/.local/lib/python3.10/site-packages/sgpt/client.py (line 41), I get complete answers as one block. However, if I keep the streaming on, the answers hang in the middle of the generation with the following errors. Server-side: Disconnected from the client (via refresh/close) Address(host='127.0.0.1', port=40490)

Client-side: sgpt --no-cache "How to open port 5900 on local firewall?" To open port 5900 on the local firewall in Ubuntu 22.04.3 LTS, you can use the following command:

sudo ufw allow to port 5900

Replace with the IP address or network of the machine you want to allow incoming traffic from, and press Enter. This╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home//.local/lib/python3.10/site-packages/sgpt/app.py:167 in main │ │ │ │ 164 │ │ │ caching=cache, │ │ 165 │ │ ) │ │ 166 │ else: │ │ ❱ 167 │ │ full_completion = DefaultHandler(role_class).handle( │ │ 168 │ │ │ prompt, │ │ 169 │ │ │ model=model, │ │ 170 │ │ │ temperature=temperature, │ │ │ │ ╭─────────────────────────────── locals ────────────────────────────────╮ │ │ │ cache = False │ │ │ │ chat = None │ │ │ │ code = False │ │ │ │ create_role = None │ │ │ │ describe_shell = False │ │ │ │ editor = False │ │ │ │ install_integration = None │ │ │ │ list_chats = None │ │ │ │ list_roles = None │ │ │ │ model = '"model"' │ │ │ │ prompt = 'How to open port 5900 on local firewall?' │ │ │ │ repl = None │ │ │ │ role = None │ │ │ │ role_class = <sgpt.role.SystemRole object at 0x7fe9ce59f2e0> │ │ │ │ shell = False │ │ │ │ show_chat = None │ │ │ │ show_role = None │ │ │ │ stdin_passed = False │ │ │ │ temperature = 0.1 │ │ │ │ top_probability = 1.0 │ │ │ ╰───────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home//.local/lib/python3.10/site-packages/sgpt/handlers/handler.py:33 in handle │ │ │ │ 30 │ │ stream = cfg.get("DISABLE_STREAMING") == "false" │ │ 31 │ │ if not stream: │ │ 32 │ │ │ typer.echo("Loading...\r", nl=False) │ │ ❱ 33 │ │ for word in self.get_completion(messages=messages, kwargs): │ │ 34 │ │ │ typer.secho(word, fg=self.color, bold=True, nl=False) │ │ 35 │ │ │ full_completion += word │ │ 36 │ │ typer.echo("\033[K" if not stream else "") # Overwrite "loading..." │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ full_completion = ' To open port 5900 on the local firewall in Ubuntu 22.04.3 LTS, you can │ │ │ │ use the'+222 │ │ │ │ kwargs = { │ │ │ │ │ 'model': '"model"', │ │ │ │ │ 'temperature': 0.1, │ │ │ │ │ 'top_probability': 1.0, │ │ │ │ │ 'caching': False │ │ │ │ } │ │ │ │ messages = [ │ │ │ │ │ { │ │ │ │ │ │ 'role': 'user', │ │ │ │ │ │ 'content': '###\nRole name: default\nYou are Command Line App │ │ │ │ ShellGPT, a programming and syst'+358 │ │ │ │ │ } │ │ │ │ ] │ │ │ │ prompt = 'How to open port 5900 on local firewall?' │ │ │ │ self = <sgpt.handlers.default_handler.DefaultHandler object at 0x7fe9ce59e020> │ │ │ │ stream = True │ │ │ │ word = 'This' │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home//.local/lib/python3.10/site-packages/sgpt/handlers/handler.py:25 in get_completion │ │ │ │ 22 │ │ raise NotImplementedError │ │ 23 │ │ │ 24 │ def get_completion(self, kwargs: Any) -> Generator[str, None, None]: │ │ ❱ 25 │ │ yield from self.client.get_completion(kwargs) │ │ 26 │ │ │ 27 │ def handle(self, prompt: str, kwargs: Any) -> str: │ │ 28 │ │ messages = self.make_messages(self.make_prompt(prompt)) │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ kwargs = { │ │ │ │ │ 'messages': [ │ │ │ │ │ │ { │ │ │ │ │ │ │ 'role': 'user', │ │ │ │ │ │ │ 'content': '###\nRole name: default\nYou are Command Line App ShellGPT, │ │ │ │ a programming and syst'+358 │ │ │ │ │ │ } │ │ │ │ │ ], │ │ │ │ │ 'model': '"model"', │ │ │ │ │ 'temperature': 0.1, │ │ │ │ │ 'top_probability': 1.0, │ │ │ │ │ 'caching': False │ │ │ │ } │ │ │ │ self = <sgpt.handlers.default_handler.DefaultHandler object at 0x7fe9ce59e020> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home//.local/lib/python3.10/site-packages/sgpt/client.py:98 in get_completion │ │ │ │ 95 │ │ :param caching: Boolean value to enable/disable caching. │ │ 96 │ │ :return: String generated completion. │ │ 97 │ │ """ │ │ ❱ 98 │ │ yield from self._request( │ │ 99 │ │ │ messages, │ │ 100 │ │ │ model, │ │ 101 │ │ │ temperature, │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ caching = False │ │ │ │ messages = [ │ │ │ │ │ { │ │ │ │ │ │ 'role': 'user', │ │ │ │ │ │ 'content': '###\nRole name: default\nYou are Command Line App │ │ │ │ ShellGPT, a programming and syst'+358 │ │ │ │ │ } │ │ │ │ ] │ │ │ │ model = '"llama-2-7b-chat.Q5_K_M.gguf"' │ │ │ │ self = <sgpt.client.OpenAIClient object at 0x7fe9ce59e500> │ │ │ │ temperature = 0.1 │ │ │ │ top_probability = 1.0 │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home//.local/lib/python3.10/site-packages/sgpt/cache.py:39 in wrapper │ │ │ │ 36 │ │ │ │ yield cache_file.read_text() │ │ 37 │ │ │ │ return │ │ 38 │ │ │ result = "" │ │ ❱ 39 │ │ │ for i in func(*args, **kwargs): │ │ 40 │ │ │ │ result += i │ │ 41 │ │ │ │ yield i │ │ 42 │ │ │ cache_file.write_text(result) │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ args = ( │ │ │ │ │ <sgpt.client.OpenAIClient object at 0x7fe9ce59e500>, │ │ │ │ │ [ │ │ │ │ │ │ { │ │ │ │ │ │ │ 'role': 'user', │ │ │ │ │ │ │ 'content': '###\nRole name: default\nYou are Command Line App │ │ │ │ ShellGPT, a programming and syst'+358 │ │ │ │ │ │ } │ │ │ │ │ ], │ │ │ │ │ '"model"', │ │ │ │ │ 0.1, │ │ │ │ │ 1.0 │ │ │ │ ) │ │ │ │ cache_file = PosixPath('/tmp/cache/34ecb037ca326fa92c799ac5101c802b') │ │ │ │ cache_key = '34ecb037ca326fa92c799ac5101c802b' │ │ │ │ func = <function OpenAIClient._request at 0x7fe9ce5b3010> │ │ │ │ i = 'This' │ │ │ │ kwargs = {} │ │ │ │ result = ' To open port 5900 on the local firewall in Ubuntu 22.04.3 LTS, you can use │ │ │ │ the'+222 │ │ │ │ self = <sgpt.cache.Cache object at 0x7fe9cefcf010> │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /home//.local/lib/python3.10/site-packages/sgpt/client.py:74 in _request │ │ │ │ 71 │ │ │ │ break │ │ 72 │ │ │ if not data: │ │ 73 │ │ │ │ continue │ │ ❱ 74 │ │ │ data = json.loads(data) # type: ignore │ │ 75 │ │ │ delta = data["choices"][0]["delta"] # type: ignore │ │ 76 │ │ │ if "content" not in delta: │ │ 77 │ │ │ │ continue │ │ │ │ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │ │ │ data = 'ping - 2023-11-09 16:06:40.104941' │ │ │ │ delta = {'content': 'This'} │ │ │ │ endpoint = 'http://127.0.0.01:8000/v1/chat/completions' │ │ │ │ line = b': ping - 2023-11-09 16:06:40.104941' │ │ │ │ messages = [ │ │ │ │ │ { │ │ │ │ │ │ 'role': 'user', │ │ │ │ │ │ 'content': '###\nRole name: default\nYou are Command Line App │ │ │ │ ShellGPT, a programming and syst'+358 │ │ │ │ │ } │ │ │ │ ] │ │ │ │ model = '"model"' │ │ │ │ response = <Response [200]> │ │ │ │ self = <sgpt.client.OpenAIClient object at 0x7fe9ce59e500> │ │ │ │ stream = True │ │ │ │ temperature = 0.1 │ │ │ │ top_probability = 1.0 │ │ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/lib/python3.10/json/init.py:346 in loads │ │ │ │ 343 │ if (cls is None and object_hook is None and │ │ 344 │ │ │ parse_int is None and parse_float is None and │ │ 345 │ │ │ parse_constant is None and object_pairs_hook is None and not kw): │ │ ❱ 346 │ │ return _default_decoder.decode(s) │ │ 347 │ if cls is None: │ │ 348 │ │ cls = JSONDecoder │ │ 349 │ if object_hook is not None: │ │ │ │ ╭──────────────────────── locals ─────────────────────────╮ │ │ │ cls = None │ │ │ │ kw = {} │ │ │ │ object_hook = None │ │ │ │ object_pairs_hook = None │ │ │ │ parse_constant = None │ │ │ │ parse_float = None │ │ │ │ parse_int = None │ │ │ │ s = 'ping - 2023-11-09 16:06:40.104941' │ │ │ ╰─────────────────────────────────────────────────────────╯ │ │ │ │ /usr/lib/python3.10/json/decoder.py:337 in decode │ │ │ │ 334 │ │ containing a JSON document). │ │ 335 │ │ │ │ 336 │ │ """ │ │ ❱ 337 │ │ obj, end = self.raw_decode(s, idx=_w(s, 0).end()) │ │ 338 │ │ end = _w(s, end).end() │ │ 339 │ │ if end != len(s): │ │ 340 │ │ │ raise JSONDecodeError("Extra data", s, end) │ │ │ │ ╭─────────────────────────────── locals ────────────────────────────────╮ │ │ │ _w = <built-in method match of re.Pattern object at 0x7fe9cf1da4d0> │ │ │ │ s = 'ping - 2023-11-09 16:06:40.104941' │ │ │ │ self = <json.decoder.JSONDecoder object at 0x7fe9cf189b10> │ │ │ ╰───────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/lib/python3.10/json/decoder.py:355 in raw_decode │ │ │ │ 352 │ │ try: │ │ 353 │ │ │ obj, end = self.scan_once(s, idx) │ │ 354 │ │ except StopIteration as err: │ │ ❱ 355 │ │ │ raise JSONDecodeError("Expecting value", s, err.value) from None │ │ 356 │ │ return obj, end │ │ 357 │ │ │ │ ╭────────────────────────── locals ──────────────────────────╮ │ │ │ idx = 0 │ │ │ │ s = 'ping - 2023-11-09 16:06:40.104941' │ │ │ │ self = <json.decoder.JSONDecoder object at 0x7fe9cf189b10> │ │ │ ╰────────────────────────────────────────────────────────────╯ │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Any advise will be highly appreciated!

TheR1D commented 9 months ago

ShellGPT is designed and tested using OpenAI LLMs. Closing this issue due to its age and lack of similar reports/requests from other users.

kirkog86 commented 8 months ago

Pity as local LLMs getting more and more attention....

TheR1D / shell_gpt

Issues with streaming (local LLM) #367