[BUG] Request too large

enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

https://big-agi.com

MIT License

5.63k stars 1.3k forks source link

[BUG] Request too large #636

Closed rohankandwal closed 2 months ago

rohankandwal commented 2 months ago

Description

Getting Request too large error, not sure how to proceed.

[Service Issue] Openai: Too Many Requests (429): "error": { "message": "Request too large for gpt-4o in organization org-Bnfdy1zwXXXXX on tokens per min (TPM): Limit 30000, Requested 35064. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.", "type": "tokens", "param": null, "code": "rate_limit_exceeded" }

Device and browser

Mac os, Arc. Deployed on Vercel

Screenshots and more

No response

Willingness to Contribute

[ ] 🙋‍♂️ Yes, I would like to contribute a fix.

Xtrah commented 2 months ago

This is not related to big-agi. Your token per minute (TPM) rate limit depends on your OpenAI usage tier.

https://platform.openai.com/docs/guides/rate-limits/usage-tiers

rohankandwal commented 2 months ago

@Xtrah Once the chat became too long, I started getting this error with every chat response. Even after I opened the same chat next day.

Xtrah commented 2 months ago

@rohankandwal This issue is not related to big-agi. You are encountering OpenAI's rate limits for your account tier. Your posted error message states this is a "Request too large" error due to exceeding the tokens per minute (TPM) limit for your organization.

To resolve this you need to reduce the length of your conversations or the frequency of your requests to below 30 000 tokens, or upgrade your OpenAI account to a higher tier if you need longer conversations/more tokens per minute.

Please read the OpenAI documentation on rate limits: https://platform.openai.com/docs/guides/rate-limits

enricoros commented 2 months ago

Great feedback @Xtrah - this is by the underlying LLM provider (OpenAI) and once you hit this limit the only solution is to walk back the chat a bit, and maybe go back and remove some earlier messages.

We have a tool for cleaning up the chat, called Cleanup, on the top-right menu: This is a screenshot of how it looks in Big-AGI 2, but it's very similar. This will give you an idea of where the tokens are allocated the most, and you can remove older messages (or "hide" them from the llm in Big-AGI 2).

I'll mark this as closed, but please let me know if you have further UI/UX ideas on how to improve the User Experience once this limit is reached.

rohankandwal commented 2 months ago

@enricoros Thanks for this, since this might happen sooner than you think, specially on a chat thread which contains code, can we have a automatic cleanup, basically the really older message gets removed. This could be a setting to keep the token size smaller.

Since we have a finite models and know the error response codes, it shouldn't be too difficult to parse the error and remove this or to show this selection window by default with an error prompt? Personally, I never associated "Cleanup" to reduce token size, i thought it's just to clear your chat history a bit.