Azure / enterprise-azureai

Unleash the power of Azure AI to your application developers in a secure & manageable way with Azure API Management and Azure Developer CLI.
MIT License
72 stars 32 forks source link

Add additional distribution strategies to AI Proxy #22

Closed iMicknl closed 5 months ago

iMicknl commented 5 months ago

Currently a round-robin strategy is used for the AI Proxy. Would be great to have multiple options here.

iMicknl commented 5 months ago

Would be good to make this configurable. Perhaps via JSON values that we inject to the AppConfiguration?

Key Value Content Type
AzureOpenAIEndpoints {} application/json

A JSON could look like this:

[
    {
        "deployment-name": "gpt-35-turbo",
        "distribution-strategy": "priority", // or "random", "priority"
        "endpoints": [
            {
                "endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
                "deployment_name": "gpt-35-turbo-ptu", // optional (if different name)
                "priority": 1
            },
            {
                "endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
                "priority": 2
            }
        ]
    },
    {
        "deployment-name": "text-embeddings-ada-002",
        "distribution-strategy": "round-robin", // or "random", "priority"
        "endpoints": [
            {
                "endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
            },
            {
                "endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
                "deployment_name": "text-embeddings-ada" // optional (if different name)
            }
        ]
    }
]

Other repositories use a YAML for this: https://github.com/timoklimmer/powerproxy-aoai/blob/main/config/config.example.yaml, and this allows us to load a more advanced config. Pulling this from the API of available models, because you might want to have a more advanced workflow.

azureholic commented 5 months ago

I came to this implementation, based of this article and repo https://techcommunity.microsoft.com/t5/fasttrack-for-azure/smart-load-balancing-for-openai-endpoints-using-containers/ba-p/4017550

{
  "routes": [
    {
      "name": "gpt-35-turbo",
      "endpoints": [
        {
          "address": "https://primaryinstance.openai.azure.com/",
          "priority": 1
        },
        {
          "address": "https://seondaryinstance.openai.azure.com/",
          "priority": 2
        }
      ]
    },
    {
      "name": "text-embedding-ada-002",
      "endpoints": [
        {
          "address": "https://primaryinstance.openai.azure.com/",
          "priority": 1
        },
        {
          "address": "https://secondaryinstance.openai.azure.com/",
          "priority": 1
        }
      ]
    },
    {
      "name": "gpt-35-turbo-withpolicy",
      "endpoints": [
        {
          "address": "https://primaryinstance.openai.azure.com/",
          "priority": 1
        }
      ]
    }
  ]
}

The repo will provision the first two routes as part of the deployment. "gpt-35-turbo-withpolicy" is just an example and is not part of the deployment (yet).

several endpoints within a route can have the same priority -> backend will be picked random several endpoints within a route can have a different priority -> backend will favor prio 1 over prio 2, but will fallback to prio 2 endpoint(s) when hitting a rate-limit on prio 1 endpoint

a route always corresponds with a unique deploymentname within the config. The endpoint should have that deployment available (proxy will not check, just forward). The reason for this is that AzureOpenAI uses the deployment name in the URL.

azureholic commented 5 months ago

feature/smartlb PR should implement this