chrisrude / oobabot

A Discord bot which talks to Large Language Model AIs running on oobabooga's text-generation-webui
MIT License
98 stars 33 forks source link

feature request: streaming using discord's edit message function #50

Closed yesbroc closed 1 year ago

jmoney7823956789378 commented 1 year ago

Already implemented, though a little jank at times.

yesbroc commented 1 year ago

how do i use it?

jmoney7823956789378 commented 1 year ago

--stream-responses The same option is available within config.yml too.

chrisrude commented 1 year ago

yup, let me know how it works for you! The challenge with this approach is that Discord will rate-limit its api responses, so depending on how long your responses are and how fast they're generated, you'll see varying results.

Thanks @jmoney7823956789378 for the tech support! :)

jmoney7823956789378 commented 1 year ago

No problem! Always got some free time at work. Do you think you'd be able to expose the "streaming rate" to be adjusted in the config? I'm still getting a big pause after about 10-15 tokens, and then a big catch-up following. This is with 65B streaming at ~7t/s. 30B at 12t/s, etc. I think we would be fine with a per-token or a per-second update rate.

chrisrude commented 1 year ago

Sure thing! Added in https://github.com/chrisrude/oobabot/commit/bf23d741d60f1bcfae1c9b58afa90f017e7af61e

If you try it, please lmk what values you find useful. I would like to have defaults that work well for most users.

jmoney7823956789378 commented 1 year ago

Sure thing! Added in bf23d74

If you try it, please lmk what values you find useful. I would like to have defaults that work well for most users.

Awesome stuff! I've been messing with it a little, but it seems like it'll still cause some strange behavior with context if it errors out. When it encounters the error below, it will completely ignore the message for further conversation. Here's the most common error I get consistently.

 Tewi says:

Cyber Forensics involves investigating digital devices or data to uncover evidence related to crimes or other incidents.

---TRUNCATED LENGTH---

Offensive cyber capabilities may be employed as part of a nation's military strategy, intelligence gathering efforts, or law enforcement operations.2023-06-22 18:27:53,819 ERROR Error: 400 Bad Request (error code: 50035): Invalid Form Body
In content: Must be 2000 or fewer in length.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/oobabot/discord_bot.py", line 382, in _send_text_response_in_channel
    last_sent_message = await self._render_streaming_response(
  File "/usr/local/lib/python3.9/dist-packages/oobabot/discord_bot.py", line 538, in _render_streaming_response
    await last_message.edit(
  File "/usr/local/lib/python3.9/dist-packages/discord/message.py", line 2178, in edit
    data = await self._state.http.edit_message(self.channel.id, self.id, params=params)
  File "/usr/local/lib/python3.9/dist-packages/discord/http.py", line 744, in request
    raise HTTPException(response, data)
discord.errors.HTTPException: 400 Bad Request (error code: 50035): Invalid Form Body
In content: Must be 2000 or fewer in length.

This is probably because I have the max_new_tokens set to 500, causing way too many tokens to be resent back and forth for editing the message. I'm also seeing my oobabot server (lxc container on separate machine) just pausing mid generation. This is the raw receiving-end from oobaboga, using --log-all-the-things. Not sure what the cause is with that, or if you're able to reproduce.

jmoney7823956789378 commented 1 year ago

FGwwWohCtV

Here are my findings: It looks like stream_responses_speed_limit: 0.7 seems to work great without any pauses or stops. Line-feed and carriage return characters dont seem to trigger bot to split responses while streaming is enabled. This would help with the overload in http request size. Also, for some reason when using stream_responses oobabot does not read the message as context for the next message.

chrisrude commented 1 year ago

Glad it worked! I'll test out 0.7 and see how it works as a default setting.

Also, for some reason when using stream_responses oobabot does not read the message as context for the next message. Thanks for pointing this out! I hadn't noticed this before. I'll see if I can address it.

Invalid Form Body In content: Must be 2000 or fewer in length. Thanks for pointing this out as well; I hadn't seen this before as my responses tend to be shorter in length. But it should be possible to split longer streaming responses across messages.

obi-o-n-e commented 1 year ago

Can this be pushed trough to the "plugin" version 🙏

would love to set the stream_responses_speed_limit: to something that doesnt hit the api wall.

chrisrude commented 1 year ago

Yup, working on it soon! It's been a minute since the last release, and I have a lot of improvements that need to get out. :)

I'm not planning to give the speed limit itself a UI widget, but on release you'll be able to change it via the advanced yaml editor.

jmoney7823956789378 commented 1 year ago

Yup, working on it soon! It's been a minute since the last release, and I have a lot of improvements that need to get out. :)

I'm not planning to give the speed limit itself a UI widget, but on release you'll be able to change it via the advanced yaml editor.

I'm still testing out your latest 0.2.0 changes, and I'm finding that using stream-responses: True still causes the messages to be omitted from chat history. 1688757366_07_07

chrisrude commented 1 year ago

Yup, I haven't fixed that part yet. I split off bug 63 to help me keep track of it. Going to close this for now.