When API response reaches token limit detect that and render a “Continue” button on the front end

It's interesting that you notice a difference between ChatGPT and HostedGPT in this regard, but it's plausible that the algorithm for managing history is different. I actually did something really naive and intended to go back and optimize it at some point but I never did. It's right here: https://github.com/AllYourBot/hostedgpt/blob/main/app/services/ai_backend/open_ai.rb#L69

First, the max_tokens should really be:

max_length_of_response_for_good_user_experience = 3000  # hard coded value we can tweak
[ input_tokens + max_length_of_response_for_good_user_experience,  context_limit_of_model ].min

I even added the Tiktoken gem to the project to prepare for doing accurate token counting but haven't addressed this. In addition, I also never got around to truncating history. It looks like preceding_messages is always returning all preceding messages. If I'm reading the code correctly, the preceding messages should eventually exceed the models context length and start erroring out. This needs to be fixed at some point. The method to get preceding messages should be:

preceding_messages_up_to_max_tokens_of(max_input_tokens_allowed)

(I'm naively naming these things just for pseudo-code purposes)

AllYourBot / hostedgpt

When API response reaches token limit detect that and render a “Continue” button on the front end #403