Closed iMicknl closed 5 months ago
Would be good to make this configurable. Perhaps via JSON values that we inject to the AppConfiguration?
Key | Value | Content Type |
---|---|---|
AzureOpenAIEndpoints | {} | application/json |
A JSON could look like this:
[
{
"deployment-name": "gpt-35-turbo",
"distribution-strategy": "priority", // or "random", "priority"
"endpoints": [
{
"endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
"deployment_name": "gpt-35-turbo-ptu", // optional (if different name)
"priority": 1
},
{
"endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
"priority": 2
}
]
},
{
"deployment-name": "text-embeddings-ada-002",
"distribution-strategy": "round-robin", // or "random", "priority"
"endpoints": [
{
"endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
},
{
"endpoint": "https://cog-d7knihn7w73zw-swedencentral.openai.azure.com",
"deployment_name": "text-embeddings-ada" // optional (if different name)
}
]
}
]
Other repositories use a YAML for this: https://github.com/timoklimmer/powerproxy-aoai/blob/main/config/config.example.yaml, and this allows us to load a more advanced config. Pulling this from the API of available models, because you might want to have a more advanced workflow.
I came to this implementation, based of this article and repo https://techcommunity.microsoft.com/t5/fasttrack-for-azure/smart-load-balancing-for-openai-endpoints-using-containers/ba-p/4017550
{
"routes": [
{
"name": "gpt-35-turbo",
"endpoints": [
{
"address": "https://primaryinstance.openai.azure.com/",
"priority": 1
},
{
"address": "https://seondaryinstance.openai.azure.com/",
"priority": 2
}
]
},
{
"name": "text-embedding-ada-002",
"endpoints": [
{
"address": "https://primaryinstance.openai.azure.com/",
"priority": 1
},
{
"address": "https://secondaryinstance.openai.azure.com/",
"priority": 1
}
]
},
{
"name": "gpt-35-turbo-withpolicy",
"endpoints": [
{
"address": "https://primaryinstance.openai.azure.com/",
"priority": 1
}
]
}
]
}
The repo will provision the first two routes as part of the deployment. "gpt-35-turbo-withpolicy" is just an example and is not part of the deployment (yet).
several endpoints within a route can have the same priority -> backend will be picked random several endpoints within a route can have a different priority -> backend will favor prio 1 over prio 2, but will fallback to prio 2 endpoint(s) when hitting a rate-limit on prio 1 endpoint
a route always corresponds with a unique deploymentname within the config. The endpoint should have that deployment available (proxy will not check, just forward). The reason for this is that AzureOpenAI uses the deployment name in the URL.
feature/smartlb PR should implement this
Currently a round-robin strategy is used for the AI Proxy. Would be great to have multiple options here.