Open GonzRon opened 11 months ago
Thank you for your response. The rate limiting feature of the AICommit plugin has not been implemented yet. If you receive a similar prompt, it is directly issued by OpenAI, and the plugin only displays the error message.
then why expose the rate limit control to the user in the UI?
I apologize for the inconvenience. The feature was initially working fine until OpenAI made some changes to the rate limits. In order to quickly fix the issue, we removed the internal implementation while still leaving the UI visible. We will release an update within a few days which will hide the control for now. We will prioritize re-implementing the feature and make it available again in the future. Thank you.
thank you kindly for the background on the issue. I was initially confused by the rate limit control + the rate limit reached error.
Thank you for understanding, UI issue would be fixed soon. Have a nice day :)
@rosuH @GonzRon
i'm the maintainer of LiteLLM we allow you to maximize your throughput/increase rate limits - load balance between multiple deployments (Azure, OpenAI) I believe litellm can be helpful here - and i'd love your feedback if we're missing something
Here's how to use it Docs: https://docs.litellm.ai/docs/routing
from litellm import Router
model_list = [{ # list of model deployments
"model_name": "gpt-3.5-turbo", # model alias
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-v-2", # actual model name
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "vllm/TheBloke/Marcoroni-70B-v1-AWQ",
"api_key": os.getenv("OPENAI_API_KEY"),
}
}]
router = Router(model_list=model_list)
# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)
Describe the bug: I can't switch to any gpt-4 model because aicommit exceeds the rate limits even though rate limit is set to 20RPM
Steps to reproduce: basic usage - rate limit set to 20rpm
Expected behavior: not exceed 10K RPM limit?!
Additional context: