Azure / api-management-policy-snippets

Re-usable examples of Azure API Management policies
MIT License
326 stars 147 forks source link

Azure OpenAI + random load balancer policy #110

Closed ericthomas1 closed 11 months ago

ericthomas1 commented 11 months ago

Hello,

To setup the Random Load Balancer policy, do I need to name each Azure Open AI Deployed Model the same in each backend?

Example:

The issue is the <deployment-name>. If I setup the Random Load Balancer Policy above, and have two different Azure OpenAI APIM Backends, the URL is different, therefore a random number of requests fail.

Hope this makes sense.

Thank you

ericthomas1 commented 11 months ago

Thoughts on this one?

ericthomas1 commented 11 months ago

The answer is "yes". From what I found, to use multiple Azure OpenAI Resources for random load balancing, the Deployed Model names need to be identical.

Example:

If one model is named differently, then the random load balancer fails randomly (whenever the Azure OpenAI Resource is randomly selected whose Deployed Model name doesn't match the one in the Request URL).

ishaan-jaff commented 7 months ago

@ericthomas1 I'm the maintainer of litellm https://github.com/BerriAI/litellm - we allow you to easily load balance between 100+ LLMs or Providers I'd love to know if this solves your problem ? And your feedback if something is missing here

Here's how it works

Router - load balancing(Docs)

LiteLLM allows you to load balance between multiple deployments (Azure, OpenAI). It picks the deployment which is below rate-limit and has the least amount of tokens used.

from litellm import Router

model_list = [{ # list of model deployments 
    "model_name": "gpt-3.5-turbo", # model alias 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-v-2", # actual model name
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "azure/chatgpt-functioncalling", 
        "api_key": os.getenv("AZURE_API_KEY"),
        "api_version": os.getenv("AZURE_API_VERSION"),
        "api_base": os.getenv("AZURE_API_BASE")
    }
}, {
    "model_name": "gpt-3.5-turbo", 
    "litellm_params": { # params for litellm completion/embedding call 
        "model": "gpt-3.5-turbo", 
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
}]

router = Router(model_list=model_list)

# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo", 
                messages=[{"role": "user", "content": "Hey, how's it going?"}])

print(response)