bricks-cloud / BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.
https://trybricks.ai/
MIT License
863 stars 60 forks source link

Load Balancing #64

Closed cheng92hao closed 3 months ago

cheng92hao commented 3 months ago

Recently, I have been studying the large model gateways LiteLLM and BricksLLM, we will use it in production, and I would like to ask some questions:

1、Why does BricksLLM not have load balancing using multiple BricksLLM instances? Will you support it in the future; 2、Why does BricksLLM not have load balancing using multiple instances of the Router object? There is only one simple fallback routing policy

spikelu2016 commented 3 months ago
  1. For people who need distributed load balancing, our thought is for the route feature to support such use cases
  2. This is in the work
spikelu2016 commented 3 months ago

What is the specific use case you have in mind? If you want to discuss privately, I am available in our Discord server

cheng92hao commented 3 months ago

The points we are considering in our company's gateway research include:

  1. Load balancing
  2. Horizontal scaling If implemented solely through routing, there would be a single point of failure. How can a single node handle high concurrency requests?
cheng92hao commented 3 months ago
  1. For people who need distributed load balancing, our thought is for the route feature to support such use cases
  2. This is in the work

Currently, I have conducted some tests and found that BricksLLM's performance is indeed much stronger than LiteLLM's. However, there are still some functional imperfections. What I would also like to understand is: if I implement load balancing at the upper layer, does BricksLLM support deploying multiple instances?

spikelu2016 commented 3 months ago

Should I rephrase the question as does BricksLLM stay consistent in a distributed environment? The answer is yes. You would need to be connected to the same DB. There might be bottlenecks with the max concurrent DB connections in a distributed environment, which you would need to handle yourself.

Does this answer your question?

cheng92hao commented 3 months ago

Should I rephrase the question as does BricksLLM stay consistent in a distributed environment? The answer is yes. You would need to be connected to the same DB. There might be bottlenecks with the max concurrent DB connections in a distributed environment, which you would need to handle yourself.

Does this answer your question?

Yes, thank you very much! By the way, does BricksLLM support custom pricing for models, e.g. 1、cost per token ;2、cost per second.... For custom models, I would like to set the billing method by myself.

donfour commented 3 months ago

@cheng92hao N

Should I rephrase the question as does BricksLLM stay consistent in a distributed environment? The answer is yes. You would need to be connected to the same DB. There might be bottlenecks with the max concurrent DB connections in a distributed environment, which you would need to handle yourself. Does this answer your question?

Yes, thank you very much! By the way, does BricksLLM support custom pricing for models, e.g. 1、cost per token ;2、cost per second.... For custom models, I would like to set the billing method by myself.

Not yet, but it's on our roadmap. We plan to allow you to specify a custom cost for prompt tokens and completion tokens, so we can calculate costs for custom models.

By the way, if you're interested in using BricksLLM for your company, feel free to chat with us here, we can provide 1-on-1 support and implement custom features for enterprise customers :)

cheng92hao commented 3 months ago

@cheng92hao N

Should I rephrase the question as does BricksLLM stay consistent in a distributed environment? The answer is yes. You would need to be connected to the same DB. There might be bottlenecks with the max concurrent DB connections in a distributed environment, which you would need to handle yourself. Does this answer your question?

Yes, thank you very much! By the way, does BricksLLM support custom pricing for models, e.g. 1、cost per token ;2、cost per second.... For custom models, I would like to set the billing method by myself.

Not yet, but it's on our roadmap. We plan to allow you to specify a custom cost for prompt tokens and completion tokens, so we can calculate costs for custom models.

By the way, if you're interested in using BricksLLM for your company, feel free to chat with us here, we can provide 1-on-1 support and implement custom features for enterprise customers :)

Thank you for your answer, We are currently conducting some research and testing. If there are any questions later, I would be happy to contact you. By the way, If there were more detailed official documentation, that would be perfect.