用evol-scope进行并发测试:
结果很慢,100并发结果:
Benchmarking summary:
Time taken for tests: 384.675 seconds
Expected number of requests: 0
Number of concurrency: 100
Total requests: 1000
Succeed requests: 1000
Failed requests: 0
Average QPS: 2.600
Average latency: 36.727
Throughput(average output tokens per second): 1225.085
Average time to first token: 36.727
Average input tokens per request: 911.000
Average output tokens per request: 471.259
Average time per output token: 0.00082
Average package per request: 1.000
Average package latency: 36.727
Description / 描述
使用vllm部署: python -m vllm.entrypoints.openai.api_server --model /data2/MiniCPM --host 0.0.0.0 --port 10999 --max-model-len 2048 --served-model-name minicpm --trust_remote_code
用evol-scope进行并发测试: 结果很慢,100并发结果: Benchmarking summary: Time taken for tests: 384.675 seconds Expected number of requests: 0 Number of concurrency: 100 Total requests: 1000 Succeed requests: 1000 Failed requests: 0 Average QPS: 2.600 Average latency: 36.727 Throughput(average output tokens per second): 1225.085 Average time to first token: 36.727 Average input tokens per request: 911.000 Average output tokens per request: 471.259 Average time per output token: 0.00082 Average package per request: 1.000 Average package latency: 36.727
为啥参数量这么小还这么慢呢,是我哪里搞错了吗
Case Explaination / 案例解释
No response