Rate limit at organisation level

LevwTech commented 9 months ago

OpenAI's rate limiting is applied at the organisation level, not user level.

If you have just started using the API the rate limiting for dalle2 is about 5 images per min.

This is applied for all users. Not for each user.

Even though open AI suggest on their website using a backoff algorithm but I don't think this is a good idea as each failed attempt contribute to the rate limit.

I suggest creating a SQS message queue that is adjusted based on the rate limit.

Each message in the queue represents a request that needs to be sent to dalle2

The lambda function that is triggered on SQS additions that consumes the messages should process 5 requests per min only this can be done using different methods:

1) create a dynamoDB table to insert each message in it after it has been processed, with a TTL field of 1 minute, if at any given time this table has more than 5 items, don't process the message and keep it in the queue.

2) create an event on the SQS consumer function to run every minute and process 5 requests, the function should run only based on that 1minute timer event, and not triggered on SQS additions.

This will require you to have the sendMessage logic in another function consumer from an SQS as well, because the consumer of the Dalle SQS will send to the sendMessage SQS, and you don't need to worry about meta/WhatsApp's rate limit for this one.

LevwTech commented 9 months ago

For v1.0 if there is a rateLimit error you can simply send a message to the user saying something such as: Please try again after a few minutes

LevwTech commented 9 months ago

For v1.0 if there is a rateLimit error you can simply send a message to the user saying something such as: Please try again after a few minutes

done in https://github.com/LevwTech/wa-gpt/commit/9b7fabc3c170c45f3fb7c3f92fbbe9a03fc1d37a

LevwTech commented 8 months ago

Will close this issue as now i have obtained a higher rate limit from openai, plus responding to the user with a try-again-later instant message is better than creating an SQS adjusted on rate limit, because users want instant replies on whatsapp.

LevwTech / wa-gpt

Rate limit at organisation level #11