Using GPT-3.5 to summarize its own answers shrinks token count by ~80% while preserving context.
This naive prompt works well enough: Summarize this: ${answer}
It also does decent token compression: Summarize this and use the fewest and shortest words possible: ${answer}
This alone can 5x the amount of past conversation that gets included in the chat completion.
Idea:
Add columns to prompts table: answer_summary, answer_summary_tokens
In the background, send answers to ChatGPT API to get summaries + their token counts.
When grabbing past prompts/answers to build context, use answer_summary over answer when available.
This will double the amount of token I/O (aka cost) since the whole message needs to be sent back to ChatGPT, but it dramatically increases context.
Further thinking: It only makes sense to use summaries of messages that are N messages old. If we use summaries of the latest messages, then we can ruin the conversation. Consider this example:
Me: Explain to me the human body in great detail.
Bot: {A very long message}
Me: Summarize that.
Bot: {Summary 1}
Me: Wait, summarize that in Spanish.
Bot: {Summary 2}
Me: Summarize that even further by using only simple words.
Bot: {Summary 3}
In this example, using the summary of message 2 instead of the full message will ruin context for all subsequent user prompts. Something to think about.
Using GPT-3.5 to summarize its own answers shrinks token count by ~80% while preserving context.
This naive prompt works well enough:
Summarize this: ${answer}
It also does decent token compression:
Summarize this and use the fewest and shortest words possible: ${answer}
This alone can 5x the amount of past conversation that gets included in the chat completion.
Idea:
prompts
table:answer_summary
,answer_summary_tokens
answer_summary
overanswer
when available.This will double the amount of token I/O (aka cost) since the whole message needs to be sent back to ChatGPT, but it dramatically increases context.
Further thinking: It only makes sense to use summaries of messages that are N messages old. If we use summaries of the latest messages, then we can ruin the conversation. Consider this example:
In this example, using the summary of message 2 instead of the full message will ruin context for all subsequent user prompts. Something to think about.