Current rate limits are of 200 messages or 40,000 tokens per minute, which will likely be reached in 5-6 questions if we use the full context window for every chat. That might be a problem.
We need to set up a system that handles GPT-4 rate limits by switching to ChatGPT if they occur. Additionally, we could limit spamming by slowing down the streaming if a single user sends queries at an impossible speed.
Current rate limits are of 200 messages or 40,000 tokens per minute, which will likely be reached in 5-6 questions if we use the full context window for every chat. That might be a problem. We need to set up a system that handles GPT-4 rate limits by switching to ChatGPT if they occur. Additionally, we could limit spamming by slowing down the streaming if a single user sends queries at an impossible speed.