assistancechat / assistance

Apache License 2.0
7 stars 5 forks source link

Handle Rate Limit Errors #83

Open SimonBiggs opened 1 year ago

SimonBiggs commented 1 year ago
2023-02-14 17:16:23.205 ERROR: Task exception was never retrieved
future: <Task finished name='Task-4' coro=<_react_to_email() done, defined at /home/simon/git/assistance.chat/src/python/assistance/_api/routers/email.py:61> exception=RateLimitError(message='The server had an error while processing your request. Sorry about that!', http_status=429, request_id=None)>
Traceback (most recent call last):
  File "/home/simon/git/assistance.chat/src/python/assistance/_api/routers/email.py", line 96, in _react_to_email
    await task(email)
  File "/home/simon/git/assistance.chat/src/python/assistance/_agents/email/create.py", line 177, in create_agent
    completions = await openai.Completion.acreate(
  File "/home/simon/git/assistance.chat/src/python/.venv/lib/python3.10/site-packages/openai/api_resources/completion.py", line 45, in acreate
    return await super().acreate(*args, **kwargs)
  File "/home/simon/git/assistance.chat/src/python/.venv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 217, in acreate
    response, _, api_key = await requestor.arequest(
  File "/home/simon/git/assistance.chat/src/python/.venv/lib/python3.10/site-packages/openai/api_requestor.py", line 311, in arequest
    resp, got_stream = await self._interpret_async_response(result, stream)
  File "/home/simon/git/assistance.chat/src/python/.venv/lib/python3.10/site-packages/openai/api_requestor.py", line 646, in _interpret_async_response
    self._interpret_response_line(
  File "/home/simon/git/assistance.chat/src/python/.venv/lib/python3.10/site-packages/openai/api_requestor.py", line 680, in _interpret_response_line
    raise self.handle_error_response(
openai.error.RateLimitError: The server had an error while processing your request. Sorry about that!
krrishdholakia commented 11 months ago

hey @SimonBiggs, I'm the maintainer of LiteLLM we allow you to create a Router to maximize throughput by load balancing + queuing (beta).

I'd love to get your feedback if this solves your issue

Here's the quick start

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = await router.acompletion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)