Open Oceania2018 opened 8 months ago
Hi @Oceania2018 I'm the maintainer of LiteLLM - we provide an Open source proxy for load balancing Azure + OpenAI + Bedrock + Vertex - 100+ LLMs
It can process (500+ requests/second)
From this thread it looks like you're trying to maximize throughput - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)
Doc: https://docs.litellm.ai/docs/proxy/reliability
model_list:
- model_name: gpt-4
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version: "2023-05-15"
api_key:
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
- model_name: gpt-4
litellm_params:
model: azure/gpt-4
api_key:
api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
litellm --config /path/to/config.yaml
curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}
'
Inspired by this load balancing idea. Load balance allow across multiple models, providers, and keys, avoid toke limitation.