LLM Load balance - Githubissues

Hi @Oceania2018 I'm the maintainer of LiteLLM - we provide an Open source proxy for load balancing Azure + OpenAI + Bedrock + Vertex - 100+ LLMs

It can process (500+ requests/second)

From this thread it looks like you're trying to maximize throughput - I hope our solution makes it easier for you. (i'd love feedback if you're trying to do this)

Here's the quick start on using LiteLLM Proxy for load balancing

Doc: https://docs.litellm.ai/docs/proxy/reliability

Step 1 Create a Config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: 
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: 
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-4",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

SciSharp / BotSharp

LLM Load balance #252

Here's the quick start on using LiteLLM Proxy for load balancing

Step 1 Create a Config.yaml

Step 2: Start the litellm proxy:

Step3 Make Request to LiteLLM proxy: