Optimize token count on message history

Using GPT-3.5 to summarize its own answers shrinks token count by ~80% while preserving context.

This naive prompt works well enough: Summarize this: ${answer}

It also does decent token compression: Summarize this and use the fewest and shortest words possible: ${answer}

This alone can 5x the amount of past conversation that gets included in the chat completion.

Idea:

Add columns to prompts table: answer_summary, answer_summary_tokens
In the background, send answers to ChatGPT API to get summaries + their token counts.
When grabbing past prompts/answers to build context, use answer_summary over answer when available.

This will double the amount of token I/O (aka cost) since the whole message needs to be sent back to ChatGPT, but it dramatically increases context.

Further thinking: It only makes sense to use summaries of messages that are N messages old. If we use summaries of the latest messages, then we can ruin the conversation. Consider this example:

Me: Explain to me the human body in great detail.
Bot: {A very long message}
Me: Summarize that.
Bot: {Summary 1}
Me: Wait, summarize that in Spanish.
Bot: {Summary 2}
Me: Summarize that even further by using only simple words.
Bot: {Summary 3}

In this example, using the summary of message 2 instead of the full message will ruin context for all subsequent user prompts. Something to think about.

danneu / telegram-chatgpt-bot

Optimize token count on message history #2