loadbalance with endpoints having different models

andredewes / apim-aoai-smart-loadbalancing

Smart load balancing for OpenAI endpoints and Azure API Management

47 stars 17 forks source link

loadbalance with endpoints having different models #6

Closed sander110419 closed 2 months ago

sander110419 commented 2 months ago

Right now we have multiple instances, all have gpt3.5-turbo, but only some have gpt-4. How can I loadbalance these so a request intended for gpt4 does not get send to a backend only supporting gpt3.5-turbo?