Token conservation - Githubissues

Torhamilton commented 1 year ago

I propose we use a two LLMs approach to cut them on the cost of using gpt4 and all expensive future variants.

This mostly applies if you are using GTP4, but why use anything else :)

GTP 4 handles current inquiries and gpt3 summarizes past histories.
As the conversation approaches admin's set token limit (e.g.2000), use gpt3 to create a summary.
use this summary for next conversation except if user hits regenerate then use original index of full history
Admin on/off switch
index 0/ first prompt is never deleted

You have:

Initial prompt
summary
last question

This may even get gpt4 to be more focus and on point

c0sogi commented 1 year ago

I added the feature to perform summarization in the background for messages larger than 512 tokens. When summarization is finished, the result is added to the MessageHistory, and when sent to llm, the summarized text is sent instead of the original text. The token threshold can be changed in ChatConfig.

5b2d56f0ba18ac65cc3b453bb4830096bc7a6187

Torhamilton commented 1 year ago

Perfect!

c0sogi / LLMChat

Token conservation #28