Closed ericthomas1 closed 1 year ago
Thoughts on this one?
The answer is "yes". From what I found, to use multiple Azure OpenAI Resources for random load balancing, the Deployed Model names need to be identical.
Example:
https:<my-apim-resource>.azure-api.net/deployments/gpt-4-32k/chat/completions?api-version=2023-05-15
gpt-4-32k
If one model is named differently, then the random load balancer fails randomly (whenever the Azure OpenAI Resource is randomly selected whose Deployed Model name doesn't match the one in the Request URL).
@ericthomas1 I'm the maintainer of litellm https://github.com/BerriAI/litellm - we allow you to easily load balance between 100+ LLMs or Providers I'd love to know if this solves your problem ? And your feedback if something is missing here
Here's how it works
LiteLLM allows you to load balance between multiple deployments (Azure, OpenAI). It picks the deployment which is below rate-limit and has the least amount of tokens used.
from litellm import Router
model_list = [{ # list of model deployments
"model_name": "gpt-3.5-turbo", # model alias
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-v-2", # actual model name
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "gpt-3.5-turbo",
"api_key": os.getenv("OPENAI_API_KEY"),
}
}]
router = Router(model_list=model_list)
# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)
Hello,
To setup the Random Load Balancer policy, do I need to name each Azure Open AI Deployed Model the same in each backend?
Example:
https://apim-<my-org>.azure-api.net
https://apim-<my-org>.azure-api.net/deployments/<deployment-name>/chat/completions?api-version=2023-05-15
The issue is the
<deployment-name>
. If I setup the Random Load Balancer Policy above, and have two different Azure OpenAI APIM Backends, the URL is different, therefore a random number of requests fail.Hope this makes sense.
Thank you