Stripping newline characters breaks proper response handling...

xydreen commented 1 year ago

Hey just an FYI, when you strip newline characters from the history (which is inserted into the prompt, obv), you can potentially break the handling of responses because the LLM sees the stripped history and thinks well, that's how I need to format it.

Example:

fc5b88bb3336535e6ad848f3ab61c4cd

You can see that the first response is properly formatted (after the history was cleared, so it's starting anew), but the second one isn't. This is because the LLM saw it's own reply in the history, noticed there were no newline characters, and decided to format its reply similarly.

I noticed a bunch of strip (or similar) methods of dealing with newlines in various aspects of the code, and having newline characters can absolutely create other issues depending on the entire prompt's formatting, but keeping the history identical to what was actually said is very important for proper responses, especially when markdown is involved.

chrisrude commented 1 year ago

Thanks for pointing this out! Will take a look.

chrisrude commented 1 year ago

Addressed in the above commit.

We now won't filter any text from a bot's history. So the example above will work.

However, we still will filter newlines from user-generated messages. The reason for this is that if we don't, a user could emulate our chat history format, and thereby impersonate the bot or other users.

chrisrude / oobabot

Stripping newline characters breaks proper response handling... #76