andredewes / apim-aoai-smart-loadbalancing

Smart load balancing for OpenAI endpoints and Azure API Management
47 stars 17 forks source link

tackle multiple instance deployments with different deployment names #5

Open rehn123 opened 4 months ago

rehn123 commented 4 months ago

If the deployment names are different for multiple instances this returns 404 Resource Not Found error. What are the possible solutions to tackle this problem?

andredewes commented 4 months ago

In this case you probably want to specify the full deployment URL in the backends:

a

To something like: https://andre-openai.openai.azure.com/openai/deployments/deployment1/

And another backend to the same instance but different deployment:

https://andre-openai.openai.azure.com/openai/deployments/deployment2/

Then they will be treated as different “backends” with their own throttling status. However, you will need to work in this line:

b

Otherwise, the final URL that the policy will build will be duplicated such as https://andre-openai.openai.azure.com/openai/deployments/gt35/deployments/gt35/chat/completions...

I didn’t stop to think what would be the exact code that can do that, but we need to extract the Host part of the backend URL before setting it (there is probably a C# expression that can do that for us).