Closed Storm0921 closed 5 days ago
Hi, @Storm0921. How did you run the tests, and how did you set up DB-GPT? The number of simultaneous requests each Model Worker can handle with "LIMIT_MODEL_CONCURRENCY"
Hi, @Storm0921. How did you run the tests, and how did you set up DB-GPT? The number of simultaneous requests each Model Worker can handle with "LIMIT_MODEL_CONCURRENCY"
Hello, I use proxyllm, starting a VLLM Model myself, and then visited through proxy. I found that limit_model_concurrency cannot control this concurrent concurrency.
@fangyinc hello,Is it not possible to control concurrency through parameters limit_model_concurrency when accessing large models through proxies? I would like to confirm this point.
This issue has been marked as stale
, because it has been over 30 days without any activity.
This issue bas been closed, because it has been marked as stale
and there has been no activity for over 7 days.
Search before asking
Operating system information
Linux
Python version information
DB-GPT version
main
Related scenes
Installation Information
[ ] Installation From Source
[ ] Docker Installation
[ ] Docker Compose Installation
[ ] Cluster Installation
[ ] AutoDL Image
[ ] Other
Device information
A100
Models information
Qwen1.5-72B-chat-Int4
What happened
LIMIT_MODEL_CONCURRENCY
What are the restrictions on this parameter in env? The parameter settings are not very useful. I passed the official vLLM benchmark test and set them to 5 and 10500 respectively, and the results were the sameWhat you expected to happen
Could you please explain the meaning of this parameter
How to reproduce
yes
Additional context
No response
Are you willing to submit PR?