Rate limit reached for gpt-4

GonzRon commented 11 months ago

Describe the bug: I can't switch to any gpt-4 model because aicommit exceeds the rate limits even though rate limit is set to 20RPM

Rate limit reached for gpt-4 in organization org-XXXXX on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.

Steps to reproduce: basic usage - rate limit set to 20rpm

Expected behavior: not exceed 10K RPM limit?!

Additional context:

rosuH commented 11 months ago

Thank you for your response. The rate limiting feature of the AICommit plugin has not been implemented yet. If you receive a similar prompt, it is directly issued by OpenAI, and the plugin only displays the error message.

GonzRon commented 11 months ago

then why expose the rate limit control to the user in the UI?

rosuH commented 11 months ago

I apologize for the inconvenience. The feature was initially working fine until OpenAI made some changes to the rate limits. In order to quickly fix the issue, we removed the internal implementation while still leaving the UI visible. We will release an update within a few days which will hide the control for now. We will prioritize re-implementing the feature and make it available again in the future. Thank you.

GonzRon commented 11 months ago

thank you kindly for the background on the issue. I was initially confused by the rate limit control + the rate limit reached error.

rosuH commented 11 months ago

Thank you for understanding, UI issue would be fixed soon. Have a nice day :)

ishaan-jaff commented 9 months ago

@rosuH @GonzRon

i'm the maintainer of LiteLLM we allow you to maximize your throughput/increase rate limits - load balance between multiple deployments (Azure, OpenAI) I believe litellm can be helpful here - and i'd love your feedback if we're missing something

Here's how to use it Docs: https://docs.litellm.ai/docs/routing

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)

AICommitApp / community

Rate limit reached for gpt-4 #7