etkecc / baibot

🤖 A Matrix bot for using diffent capabilities (text-generation, text-to-speech, speech-to-text, image-generation, etc.) of AI / Large Language Models (OpenAI, Anthropic, etc.)
GNU Affero General Public License v3.0
44 stars 4 forks source link

How can I change the response templates ? #14

Closed gnouts closed 1 month ago

gnouts commented 1 month ago

I'd like to edit the templates BaiBot is using to respond, mostly for the transcription answers but could be nice in general.

In my case, I'd like to remove the little ear in answer when doing an audio transcription. And, since I'm using Element X and you can't (currently) copy a message if it's only a quote (starting with >), I'd like baibot to answer with a plain not quoted text.

My use case for this is having a dedicated chat room with the bot and using it as voice notes list. Also drafting messages I can easily forward in other Element rooms.

spantaleev commented 1 month ago

This is currently not configurable.

If you don't wish for Text Generation to happen, I suppose you've changed the 🦻 Speech-to-Text / 🪄 Flow Type setting to only_transcribe (e.g. !bai config room speech-to-text set-flow-type only_transcribe).

This way, the bot does not start a text-generation thread as a response to your audio message, but simply does a one-off reply with its transcription.


When the bot replies within a thread (the default behavior), the > 🦻 prefix is not just for visual indication, but also something that tells the bot that "this is a transcribed message and even though I (the bot) am sending it, it's actually to be considered a user message". So this prefix is a way for the bot to keep state within the conversation itself, so that later when you continue the conversation, it can attribute the message to you when passing it to the text-generation model.

Now.. I suppose we don't necessarily have to use a quote (>), but it is actually a quote of what you're saying, so it made sense to do it.


When the bot replies directly to your message (without a thread) (when 🦻 Speech-to-Text / 🪄 Flow Type is explicitly set to only_transcribe), the conversation would typically not be continuing (since we're not operating in a thread). You may reply to the transcribed message, but that would not currently do anything (in the future, it may trigger the bot).

As such, it's more feasible to remove the 🦻 emoji prefix. Removing the quotation marker (>) may be possible too, although.. it's somewhat strange.

Right now, with both of these indicators present, it's obvious that the bot is replying to an audio message with a quote of what it heard in that message. If it just replied with some text without any such annotations, it would be odd (at least to have this as a default behavior) as that makes it look like the bot is actually replying to what it heard.


Above, I only describe why the default behavior is what it is and why I think it makes sense to keep it this way. I understand that you'd like to customize these settings in your own room - it's a special case and others won't be confused by the lack of indicators.

If you're already using the 🦻 Speech-to-Text / 🪄 Flow Type setting to get half-way there (no threads), I believe we'll need one more setting (e.g. "Speech-to-Text / Response Type") that lets you choose between 2 options:

I wonder if it makes sense to introduce a 3rd option (auto), which makes the response type automatically adapt to the 🦻 Speech-to-Text / 🪄 Flow Type setting:


It also seems like not being able to copy quoted messages is a bug in Element X and should be addressed there.

Still, I understand that your best-case scenario use-case involves directly forwarding messages without editing them. In that case, you don't want them quoted or prefixed with emojis. Only when transcription got something wrong you go in and copy the message to make some edits.

spantaleev commented 1 month ago

https://github.com/etkecc/baibot/issues/17 indicated that our default behavior was broken in certain clients, mistaking our bare > 🦻 Transcribed text replies for a fallback for rich replies.

Because of it, I have adjusted the default behavior for transcription replies (outside of threads) to not use > 🦻 prefixing. Messages are:

Both of these actions indicate that this is not a bot message, but a transcription based on something it "heard".


This new behavior probably improves (or fixes) your issue. The only potential problem I see is the fact that m.notice is being used. If you wish to forward these messages elsewhere, you should be aware that they may look and behave differently due to this.


This feature is part of the new 1.2.0 release.

Here's a screenshot of what transcriptions look like now:

image

gnouts commented 1 month ago

the m.notice indeed makes it weird to forward, but removing the blockquote makes ElementX happy. So that's good enough for my usage. Thanks a lot !