InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.59k stars 419 forks source link

[Bug] 并发场景下,发起大的输入token请求时会导致流式响应出现问题 #2709

Open zhouyuustc opened 3 days ago

zhouyuustc commented 3 days ago

Checklist

Describe the bug

如图所示,我采用5个线程1001-1005同时发送请求,每个请求内容不同,并且每个请求的输入token都在8w左右,开启了缓存 这是第一次未命中缓存,5个请求的响应时间

b06993d3498e36bb7313512b25bf770

这是第二次执行,但是修改了第一个请求1001的输入内容,使其无法命中缓存,理论上应该只有他无法命中缓存需要较长的处理时间,但是事实却是只有1002的第一帧数据很快到达,1003-1005第一帧和最后一帧数据基本上同时在30s左右到达,很明显另外四个请求无法正常的处理了

408bee5a1b4a9ab204812acc54e4745

注: 使用单张a800部署的qwen2.5 14b awq量化模型,开启了前缀缓存和kv cahce量化

Reproduction

首先按照下面命令运行模型: CUDA_VISIBLE_DEVICES=5 lmdeploy serve api_server /mnt/qwen2.5/qwen14bInt/Qwen/Qwen2___5-14B-Instruct-AWQ --backend turbomind --server-port 35553 --model-name qwenInt4 --model-format awq --session-len 100000 --cache-block-seq-len 512 --max-batch-size 512 --enable-prefix-caching --log-level INFO --cache-max-entry-count 0.8 --quant-policy=4 >> /mnt/qwen2.5/qwenInt4/qwen14btmp1.txt 2>&1

然后通过脚本进行测试 ` import requests import time from concurrent.futures import ThreadPoolExecutor, as_completed

url = "http://localhost:35553/v1/chat/completions" headers = {'Content-Type': 'application/json'}

def make_request(thread_id, content): data = { "model": "qwenInt4", "messages": [ { "role": "user", "content": content } ], "temperature": 0.1, "top_p": 1, "max_tokens": 2000, "stream": True }

start_time = time.time()
with requests.post(url, headers=headers, json=data, stream=True) as response:
    first_frame_time = None
    for _ in response.iter_lines(decode_unicode=True):
        if first_frame_time is None:
            # 记录收到第一帧的时间
            first_frame_time = time.time() - start_time
            # print(f"Thread-{thread_id}: {line}")
        # 打印每一行,或执行其他逻辑
        # print(f"Thread-{thread_id}: {line}")

end_time = time.time()
return (thread_id, first_frame_time, end_time - start_time)

def run_threads(n_threads, contents, start_id): with ThreadPoolExecutor(max_workers=n_threads) as executor: futures = [executor.submit(make_request, start_id + i, contents[i]) for i in range(n_threads)] for future in as_completed(futures): thread_id, first_frame_time, total_time = future.result() print(f"Thread-{thread_id} 第一帧: {first_frame_time:.3f} 秒, 总时间: {total_time:.3f} 秒")

if name == 'main': contents = ["","","","",""] # 每个线程发送不同的内容 n_threads = len(contents) start_id = 1001 # 启动的线程名起始值 print(f"\nRunning with {n_threads} threads:") run_threads(n_threads, contents, start_id)`

最后在contents中放入5个不同的字符串,运行脚本得到第一次请求结果,修改contents[0]位置的内容,然后运行得到第二次请求结果

Environment

(base) root@bac2bfc84cdf:/# lmdeploy check_env
sys.platform: linux
Python: 3.10.14 (main, May 29 2024, 23:47:02) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA Graphics Device
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.1+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.18.1+cu121
LMDeploy: 0.5.3+
transformers: 4.44.0
gradio: 4.42.0
fastapi: 0.111.0
pydantic: 2.7.2
triton: 2.3.1
NVIDIA Topology:
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_0  mlx5_1  mlx5_2  mlx5_3  mlx5_4  mlx5_5  mlx5_6 mlx5_7   CPU Affinity    NUMA Affinity
GPU0     X      NV8     NV8     NV8     NV8     NV8     NV8     NV8     PXB     PXB     NODE    NODE    SYS     SYS     SYS    SYS      0-31,64-95      0
GPU1    NV8      X      NV8     NV8     NV8     NV8     NV8     NV8     PXB     PXB     NODE    NODE    SYS     SYS     SYS    SYS      0-31,64-95      0
GPU2    NV8     NV8      X      NV8     NV8     NV8     NV8     NV8     NODE    NODE    PXB     PXB     SYS     SYS     SYS    SYS      0-31,64-95      0
GPU3    NV8     NV8     NV8      X      NV8     NV8     NV8     NV8     NODE    NODE    PXB     PXB     SYS     SYS     SYS    SYS      0-31,64-95      0
GPU4    NV8     NV8     NV8     NV8      X      NV8     NV8     NV8     SYS     SYS     SYS     SYS     PXB     PXB     NODE   NODE     32-63,96-127    1
GPU5    NV8     NV8     NV8     NV8     NV8      X      NV8     NV8     SYS     SYS     SYS     SYS     PXB     PXB     NODE   NODE     32-63,96-127    1
GPU6    NV8     NV8     NV8     NV8     NV8     NV8      X      NV8     SYS     SYS     SYS     SYS     NODE    NODE    PXB    PXB      32-63,96-127    1
GPU7    NV8     NV8     NV8     NV8     NV8     NV8     NV8      X      SYS     SYS     SYS     SYS     NODE    NODE    PXB    PXB      32-63,96-127    1
mlx5_0  PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS      X      PIX     NODE    NODE    SYS     SYS     SYS    SYS
mlx5_1  PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS     PIX      X      NODE    NODE    SYS     SYS     SYS    SYS
mlx5_2  NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE    NODE     X      PIX     SYS     SYS     SYS    SYS
mlx5_3  NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE    NODE    PIX      X      SYS     SYS     SYS    SYS
mlx5_4  SYS     SYS     SYS     SYS     PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS      X      PIX     NODE   NODE
mlx5_5  SYS     SYS     SYS     SYS     PXB     PXB     NODE    NODE    SYS     SYS     SYS     SYS     PIX      X      NODE   NODE
mlx5_6  SYS     SYS     SYS     SYS     NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE    NODE     X     PIX
mlx5_7  SYS     SYS     SYS     SYS     NODE    NODE    PXB     PXB     SYS     SYS     SYS     SYS     NODE    NODE    PIX     X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

下面时两次运行脚本lmdeploy处理日志,过滤掉输入的prompt(不然内容太多)

(base) root@bac2bfc84cdf:/# tail -f /mnt/qwen2.5/qwenInt4/qwen14btmp1.txt | grep -v "You are a helpful assistant"
[TM][INFO] [BlockManager] max_block_count = 2191
[TM][INFO] [BlockManager] chunk_size = 2191
[TM][INFO] LlamaBatch<T>::Start()
HINT:    Please open http://0.0.0.0:35553 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:35553 in a browser for detailed api usage!!!
HINT:    Please open http://0.0.0.0:35553 in a browser for detailed api usage!!!
INFO:     Started server process [1885900]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:35553 (Press CTRL+C to quit)
INFO:     36.32.8.98:3687 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 09:57:16,657 - lmdeploy - INFO - session_id=1, history_tokens=0, input_tokens=84015, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 09:57:16,657 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8192, sum_k=8192, max_q=8192, max_k=8192
INFO:     36.32.8.98:3693 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 09:57:16,859 - lmdeploy - INFO - session_id=2, history_tokens=0, input_tokens=86606, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 09:57:16,859 - lmdeploy - INFO - Register stream callback for 2
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
INFO:     36.32.8.98:3691 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 09:57:17,608 - lmdeploy - INFO - session_id=3, history_tokens=0, input_tokens=80764, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 09:57:17,608 - lmdeploy - INFO - Register stream callback for 3
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
INFO:     36.32.8.98:3686 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[TM][INFO] [ProcessInferRequests] Request for 2 received.
[TM][INFO] [ProcessInferRequests] Request for 3 received.
[TM][INFO] ------------------------- step = 0 -------------------------
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=16896, max_q=8704, max_k=16896
2024-11-05 09:57:17,788 - lmdeploy - INFO - session_id=4, history_tokens=0, input_tokens=76858, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 09:57:17,788 - lmdeploy - INFO - Register stream callback for 4
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
INFO:     36.32.8.98:3692 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 09:57:17,970 - lmdeploy - INFO - session_id=5, history_tokens=0, input_tokens=85709, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 09:57:17,971 - lmdeploy - INFO - Register stream callback for 5
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 4 received.
[TM][INFO] [ProcessInferRequests] Request for 5 received.
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=25600, max_q=8704, max_k=25600
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=34304, max_q=8704, max_k=34304
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=43008, max_q=8704, max_k=43008
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=51712, max_q=8704, max_k=51712
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=60416, max_q=8704, max_k=60416
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=69120, max_q=8704, max_k=69120
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=8704, sum_k=77824, max_q=8704, max_k=77824
[TM][INFO] [Forward] [0, 2), dc=0, pf=2, sum_q=8704, sum_k=86528, max_q=6191, max_k=84015
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=11216, max_q=8703, max_k=84016
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=19919, max_q=8703, max_k=84017
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=28622, max_q=8703, max_k=84018
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=37325, max_q=8703, max_k=84019
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=46028, max_q=8703, max_k=84020
[TM][INFO] ------------------------- step = 84020 -------------------------
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=54731, max_q=8703, max_k=84021
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=63434, max_q=8703, max_k=84022
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=72137, max_q=8703, max_k=84023
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8704, sum_k=80840, max_q=8703, max_k=84024
[TM][INFO] [Forward] [0, 3), dc=1, pf=2, sum_q=8704, sum_k=89543, max_q=5766, max_k=86606
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=11639, max_q=8702, max_k=86607
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=20341, max_q=8702, max_k=86608
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=29043, max_q=8702, max_k=86609
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=37745, max_q=8702, max_k=86610
[TM][INFO] ------------------------- step = 86610 -------------------------
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=46447, max_q=8702, max_k=86611
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=55149, max_q=8702, max_k=86612
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=63851, max_q=8702, max_k=86613
[TM][INFO] [Forward] [0, 3), dc=2, pf=1, sum_q=8704, sum_k=72553, max_q=8702, max_k=86614
[TM][INFO] [Forward] [0, 4), dc=2, pf=2, sum_q=8704, sum_k=81255, max_q=8211, max_k=86615
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=9192, max_q=8701, max_k=86616
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=17893, max_q=8701, max_k=86617
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=26594, max_q=8701, max_k=86618
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=35295, max_q=8701, max_k=86619
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=43996, max_q=8701, max_k=86620
[TM][INFO] ------------------------- step = 86620 -------------------------
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=52697, max_q=8701, max_k=86621
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=61398, max_q=8701, max_k=86622
[TM][INFO] [Forward] [0, 4), dc=3, pf=1, sum_q=8704, sum_k=70099, max_q=8701, max_k=86623
[TM][INFO] [Forward] [0, 5), dc=3, pf=2, sum_q=8704, sum_k=78800, max_q=6759, max_k=86624
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=10642, max_q=8700, max_k=86625
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=19342, max_q=8700, max_k=86626
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=28042, max_q=8700, max_k=86627
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=36742, max_q=8700, max_k=86628
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=45442, max_q=8700, max_k=86629
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=54142, max_q=8700, max_k=86630
[TM][INFO] ------------------------- step = 86630 -------------------------
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=62842, max_q=8700, max_k=86631
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=71542, max_q=8700, max_k=86632
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=8704, sum_k=80242, max_q=8700, max_k=86633
[TM][INFO] [Forward] [0, 5), dc=4, pf=1, sum_q=5471, sum_k=85709, max_q=5467, max_k=86634
[TM][INFO] ------------------------- step = 86640 -------------------------
[TM][INFO] [Interrupt] slot = 2, id = 3
[TM][INFO] [forward] Request completed for 3
[TM][INFO] ------------------------- step = 86650 -------------------------
[TM][INFO] ------------------------- step = 86660 -------------------------
[TM][INFO] ------------------------- step = 86670 -------------------------
[TM][INFO] ------------------------- step = 86680 -------------------------
2024-11-05 09:59:29,812 - lmdeploy - INFO - UN-register stream callback for 3
[TM][INFO] ------------------------- step = 86690 -------------------------
[TM][INFO] ------------------------- step = 86700 -------------------------
[TM][INFO] ------------------------- step = 86710 -------------------------
[TM][INFO] ------------------------- step = 86720 -------------------------
[TM][INFO] ------------------------- step = 86730 -------------------------
[TM][INFO] ------------------------- step = 86740 -------------------------
[TM][INFO] ------------------------- step = 86750 -------------------------
[TM][INFO] ------------------------- step = 86760 -------------------------
[TM][INFO] ------------------------- step = 86770 -------------------------
[TM][INFO] ------------------------- step = 86780 -------------------------
[TM][INFO] ------------------------- step = 86790 -------------------------
[TM][INFO] ------------------------- step = 86800 -------------------------
[TM][INFO] ------------------------- step = 86810 -------------------------
[TM][INFO] ------------------------- step = 86820 -------------------------
[TM][INFO] ------------------------- step = 86830 -------------------------
[TM][INFO] ------------------------- step = 86840 -------------------------
[TM][INFO] ------------------------- step = 86850 -------------------------
[TM][INFO] ------------------------- step = 86860 -------------------------
[TM][INFO] ------------------------- step = 86870 -------------------------
[TM][INFO] ------------------------- step = 86880 -------------------------
[TM][INFO] ------------------------- step = 86890 -------------------------
[TM][INFO] [Interrupt] slot = 3, id = 5
[TM][INFO] [forward] Request completed for 5
2024-11-05 09:59:34,535 - lmdeploy - INFO - UN-register stream callback for 5
[TM][INFO] ------------------------- step = 86900 -------------------------
[TM][INFO] ------------------------- step = 86910 -------------------------
[TM][INFO] [Interrupt] slot = 2, id = 4
[TM][INFO] [forward] Request completed for 4
2024-11-05 09:59:34,920 - lmdeploy - INFO - UN-register stream callback for 4
[TM][INFO] ------------------------- step = 86920 -------------------------
[TM][INFO] [Interrupt] slot = 1, id = 2
[TM][INFO] [forward] Request completed for 2
2024-11-05 09:59:34,996 - lmdeploy - INFO - UN-register stream callback for 2
[TM][INFO] ------------------------- step = 84350 -------------------------
[TM][INFO] ------------------------- step = 84360 -------------------------
[TM][INFO] ------------------------- step = 84370 -------------------------
[TM][INFO] ------------------------- step = 84380 -------------------------
[TM][INFO] ------------------------- step = 84390 -------------------------
[TM][INFO] ------------------------- step = 84400 -------------------------
[TM][INFO] ------------------------- step = 84410 -------------------------
[TM][INFO] ------------------------- step = 84420 -------------------------
[TM][INFO] ------------------------- step = 84430 -------------------------
[TM][INFO] ------------------------- step = 84440 -------------------------
[TM][INFO] ------------------------- step = 84450 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-11-05 09:59:36,387 - lmdeploy - INFO - UN-register stream callback for 1

INFO:     36.32.8.98:21759 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 10:04:48,639 - lmdeploy - INFO - session_id=6, history_tokens=0, input_tokens=85709, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 10:04:48,639 - lmdeploy - INFO - Register stream callback for 6
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 6 received.
[TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=205, sum_k=85709, max_q=205, max_k=85709
INFO:     36.32.8.98:59091 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 10:04:48,811 - lmdeploy - INFO - session_id=7, history_tokens=0, input_tokens=76861, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 10:04:48,812 - lmdeploy - INFO - Register stream callback for 7
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
INFO:     36.32.8.98:59088 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[TM][INFO] [ProcessInferRequests] Request for 7 received.
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=8191, max_q=8191, max_k=85710
2024-11-05 10:04:48,932 - lmdeploy - INFO - session_id=8, history_tokens=0, input_tokens=84015, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 10:04:48,932 - lmdeploy - INFO - Register stream callback for 8
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 8 received.
[TM][INFO] ------------------------- step = 85710 -------------------------
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=16382, max_q=8191, max_k=85711
INFO:     36.32.8.98:62171 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 10:04:50,396 - lmdeploy - INFO - session_id=9, history_tokens=0, input_tokens=80764, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 10:04:50,396 - lmdeploy - INFO - Register stream callback for 9
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
INFO:     36.32.8.98:59075 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-11-05 10:04:50,589 - lmdeploy - INFO - session_id=10, history_tokens=0, input_tokens=86606, max_new_tokens=2000, seq_start=True, seq_end=True, step=0, prep=True
2024-11-05 10:04:50,589 - lmdeploy - INFO - Register stream callback for 10
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 9 received.
[TM][INFO] [ProcessInferRequests] Request for 10 received.
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=24573, max_q=8191, max_k=85712
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=32764, max_q=8191, max_k=85713
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=40955, max_q=8191, max_k=85714
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=49146, max_q=8191, max_k=85715
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=57337, max_q=8191, max_k=85716
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=65528, max_q=8191, max_k=85717
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=8192, sum_k=73719, max_q=8191, max_k=85718
[TM][INFO] [Forward] [0, 2), dc=1, pf=1, sum_q=3143, sum_k=76861, max_q=3142, max_k=85719
[TM][INFO] [Forward] [2, 3), dc=0, pf=1, sum_q=47, sum_k=84015, max_q=47, max_k=84015
[TM][INFO] [Forward] [3, 4), dc=0, pf=1, sum_q=78, sum_k=86606, max_q=78, max_k=86606
[TM][INFO] [Forward] [4, 5), dc=0, pf=1, sum_q=380, sum_k=80764, max_q=380, max_k=80764
[TM][INFO] ------------------------- step = 86610 -------------------------
[TM][INFO] ------------------------- step = 86620 -------------------------
[TM][INFO] ------------------------- step = 86630 -------------------------
[TM][INFO] ------------------------- step = 86640 -------------------------
[TM][INFO] ------------------------- step = 86650 -------------------------
[TM][INFO] ------------------------- step = 86660 -------------------------
[TM][INFO] [Interrupt] slot = 4, id = 9
[TM][INFO] [forward] Request completed for 9
[TM][INFO] ------------------------- step = 86670 -------------------------
[TM][INFO] ------------------------- step = 86680 -------------------------
[TM][INFO] ------------------------- step = 86690 -------------------------
[TM][INFO] ------------------------- step = 86700 -------------------------
[TM][INFO] [Interrupt] slot = 1, id = 7
[TM][INFO] [forward] Request completed for 7
[TM][INFO] ------------------------- step = 86710 -------------------------
[TM][INFO] ------------------------- step = 86720 -------------------------
[TM][INFO] ------------------------- step = 86730 -------------------------
[TM][INFO] ------------------------- step = 86740 -------------------------
[TM][INFO] ------------------------- step = 86750 -------------------------
[TM][INFO] ------------------------- step = 86760 -------------------------
[TM][INFO] ------------------------- step = 86770 -------------------------
[TM][INFO] ------------------------- step = 86780 -------------------------
[TM][INFO] ------------------------- step = 86790 -------------------------
[TM][INFO] ------------------------- step = 86800 -------------------------
[TM][INFO] ------------------------- step = 86810 -------------------------
[TM][INFO] ------------------------- step = 86820 -------------------------
[TM][INFO] [Interrupt] slot = 2, id = 10
[TM][INFO] [forward] Request completed for 10
[TM][INFO] ------------------------- step = 85950 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 6
[TM][INFO] [forward] Request completed for 6
[TM][INFO] ------------------------- step = 84250 -------------------------
2024-11-05 10:05:17,578 - lmdeploy - INFO - UN-register stream callback for 9
[TM][INFO] ------------------------- step = 84260 -------------------------
2024-11-05 10:05:17,587 - lmdeploy - INFO - UN-register stream callback for 6
2024-11-05 10:05:17,594 - lmdeploy - INFO - UN-register stream callback for 7
2024-11-05 10:05:17,688 - lmdeploy - INFO - UN-register stream callback for 10
[TM][INFO] ------------------------- step = 84270 -------------------------
[TM][INFO] ------------------------- step = 84280 -------------------------
[TM][INFO] ------------------------- step = 84290 -------------------------
[TM][INFO] ------------------------- step = 84300 -------------------------
[TM][INFO] ------------------------- step = 84310 -------------------------
[TM][INFO] ------------------------- step = 84320 -------------------------
[TM][INFO] ------------------------- step = 84330 -------------------------
[TM][INFO] ------------------------- step = 84340 -------------------------
[TM][INFO] ------------------------- step = 84350 -------------------------
[TM][INFO] ------------------------- step = 84360 -------------------------
[TM][INFO] ------------------------- step = 84370 -------------------------
[TM][INFO] ------------------------- step = 84380 -------------------------
[TM][INFO] ------------------------- step = 84390 -------------------------
[TM][INFO] ------------------------- step = 84400 -------------------------
[TM][INFO] ------------------------- step = 84410 -------------------------
[TM][INFO] ------------------------- step = 84420 -------------------------
[TM][INFO] ------------------------- step = 84430 -------------------------
[TM][INFO] ------------------------- step = 84440 -------------------------
[TM][INFO] ------------------------- step = 84450 -------------------------
[TM][INFO] ------------------------- step = 84460 -------------------------
[TM][INFO] ------------------------- step = 84470 -------------------------
[TM][INFO] ------------------------- step = 84480 -------------------------
[TM][INFO] ------------------------- step = 84490 -------------------------
[TM][INFO] ------------------------- step = 84500 -------------------------
[TM][INFO] ------------------------- step = 84510 -------------------------
[TM][INFO] ------------------------- step = 84520 -------------------------
[TM][INFO] ------------------------- step = 84530 -------------------------
[TM][INFO] ------------------------- step = 84540 -------------------------
[TM][INFO] ------------------------- step = 84550 -------------------------
[TM][INFO] ------------------------- step = 84560 -------------------------
[TM][INFO] ------------------------- step = 84570 -------------------------
[TM][INFO] ------------------------- step = 84580 -------------------------
[TM][INFO] ------------------------- step = 84590 -------------------------
[TM][INFO] ------------------------- step = 84600 -------------------------
[TM][INFO] ------------------------- step = 84610 -------------------------
[TM][INFO] ------------------------- step = 84620 -------------------------
[TM][INFO] ------------------------- step = 84630 -------------------------
[TM][INFO] ------------------------- step = 84640 -------------------------
[TM][INFO] ------------------------- step = 84650 -------------------------
[TM][INFO] ------------------------- step = 84660 -------------------------
[TM][INFO] ------------------------- step = 84670 -------------------------
[TM][INFO] ------------------------- step = 84680 -------------------------
[TM][INFO] ------------------------- step = 84690 -------------------------
[TM][INFO] ------------------------- step = 84700 -------------------------
[TM][INFO] ------------------------- step = 84710 -------------------------
[TM][INFO] ------------------------- step = 84720 -------------------------
[TM][INFO] ------------------------- step = 84730 -------------------------
[TM][INFO] ------------------------- step = 84740 -------------------------
[TM][INFO] ------------------------- step = 84750 -------------------------
[TM][INFO] ------------------------- step = 84760 -------------------------
[TM][INFO] ------------------------- step = 84770 -------------------------
[TM][INFO] ------------------------- step = 84780 -------------------------
[TM][INFO] ------------------------- step = 84790 -------------------------
[TM][INFO] ------------------------- step = 84800 -------------------------
[TM][INFO] ------------------------- step = 84810 -------------------------
[TM][INFO] ------------------------- step = 84820 -------------------------
[TM][INFO] ------------------------- step = 84830 -------------------------
[TM][INFO] ------------------------- step = 84840 -------------------------
[TM][INFO] ------------------------- step = 84850 -------------------------
[TM][INFO] ------------------------- step = 84860 -------------------------
[TM][INFO] ------------------------- step = 84870 -------------------------
[TM][INFO] ------------------------- step = 84880 -------------------------
[TM][INFO] ------------------------- step = 84890 -------------------------
[TM][INFO] ------------------------- step = 84900 -------------------------
[TM][INFO] ------------------------- step = 84910 -------------------------
[TM][INFO] ------------------------- step = 84920 -------------------------
[TM][INFO] ------------------------- step = 84930 -------------------------
[TM][INFO] ------------------------- step = 84940 -------------------------
[TM][INFO] ------------------------- step = 84950 -------------------------
[TM][INFO] ------------------------- step = 84960 -------------------------
[TM][INFO] ------------------------- step = 84970 -------------------------
[TM][INFO] ------------------------- step = 84980 -------------------------
[TM][INFO] ------------------------- step = 84990 -------------------------
[TM][INFO] ------------------------- step = 85000 -------------------------
[TM][INFO] ------------------------- step = 85010 -------------------------
[TM][INFO] ------------------------- step = 85020 -------------------------
[TM][INFO] ------------------------- step = 85030 -------------------------
[TM][INFO] ------------------------- step = 85040 -------------------------
[TM][INFO] ------------------------- step = 85050 -------------------------
[TM][INFO] ------------------------- step = 85060 -------------------------
[TM][INFO] ------------------------- step = 85070 -------------------------
[TM][INFO] ------------------------- step = 85080 -------------------------
[TM][INFO] ------------------------- step = 85090 -------------------------
[TM][INFO] ------------------------- step = 85100 -------------------------
[TM][INFO] ------------------------- step = 85110 -------------------------
[TM][INFO] ------------------------- step = 85120 -------------------------
[TM][INFO] ------------------------- step = 85130 -------------------------
[TM][INFO] ------------------------- step = 85140 -------------------------
[TM][INFO] ------------------------- step = 85150 -------------------------
[TM][INFO] ------------------------- step = 85160 -------------------------
[TM][INFO] ------------------------- step = 85170 -------------------------
[TM][INFO] ------------------------- step = 85180 -------------------------
[TM][INFO] ------------------------- step = 85190 -------------------------
[TM][INFO] ------------------------- step = 85200 -------------------------
[TM][INFO] ------------------------- step = 85210 -------------------------
[TM][INFO] ------------------------- step = 85220 -------------------------
[TM][INFO] ------------------------- step = 85230 -------------------------
[TM][INFO] ------------------------- step = 85240 -------------------------
[TM][INFO] ------------------------- step = 85250 -------------------------
[TM][INFO] ------------------------- step = 85260 -------------------------
[TM][INFO] ------------------------- step = 85270 -------------------------
[TM][INFO] ------------------------- step = 85280 -------------------------
[TM][INFO] ------------------------- step = 85290 -------------------------
[TM][INFO] ------------------------- step = 85300 -------------------------
[TM][INFO] ------------------------- step = 85310 -------------------------
[TM][INFO] ------------------------- step = 85320 -------------------------
[TM][INFO] ------------------------- step = 85330 -------------------------
[TM][INFO] ------------------------- step = 85340 -------------------------
[TM][INFO] ------------------------- step = 85350 -------------------------
[TM][INFO] ------------------------- step = 85360 -------------------------
[TM][INFO] ------------------------- step = 85370 -------------------------
[TM][INFO] ------------------------- step = 85380 -------------------------
[TM][INFO] ------------------------- step = 85390 -------------------------
[TM][INFO] ------------------------- step = 85400 -------------------------
[TM][INFO] ------------------------- step = 85410 -------------------------
[TM][INFO] ------------------------- step = 85420 -------------------------
[TM][INFO] ------------------------- step = 85430 -------------------------
[TM][INFO] ------------------------- step = 85440 -------------------------
[TM][INFO] ------------------------- step = 85450 -------------------------
[TM][INFO] ------------------------- step = 85460 -------------------------
[TM][INFO] ------------------------- step = 85470 -------------------------
[TM][INFO] ------------------------- step = 85480 -------------------------
[TM][INFO] ------------------------- step = 85490 -------------------------
[TM][INFO] ------------------------- step = 85500 -------------------------
[TM][INFO] ------------------------- step = 85510 -------------------------
[TM][INFO] ------------------------- step = 85520 -------------------------
[TM][INFO] ------------------------- step = 85530 -------------------------
[TM][INFO] ------------------------- step = 85540 -------------------------
[TM][INFO] ------------------------- step = 85550 -------------------------
[TM][INFO] ------------------------- step = 85560 -------------------------
[TM][INFO] ------------------------- step = 85570 -------------------------
[TM][INFO] ------------------------- step = 85580 -------------------------
[TM][INFO] ------------------------- step = 85590 -------------------------
[TM][INFO] ------------------------- step = 85600 -------------------------
[TM][INFO] ------------------------- step = 85610 -------------------------
[TM][INFO] ------------------------- step = 85620 -------------------------
[TM][INFO] ------------------------- step = 85630 -------------------------
[TM][INFO] ------------------------- step = 85640 -------------------------
[TM][INFO] ------------------------- step = 85650 -------------------------
[TM][INFO] ------------------------- step = 85660 -------------------------
[TM][INFO] ------------------------- step = 85670 -------------------------
[TM][INFO] ------------------------- step = 85680 -------------------------
[TM][INFO] ------------------------- step = 85690 -------------------------
[TM][INFO] ------------------------- step = 85700 -------------------------
[TM][INFO] ------------------------- step = 85710 -------------------------
[TM][INFO] ------------------------- step = 85720 -------------------------
[TM][INFO] ------------------------- step = 85730 -------------------------
[TM][INFO] ------------------------- step = 85740 -------------------------
[TM][INFO] ------------------------- step = 85750 -------------------------
[TM][INFO] ------------------------- step = 85760 -------------------------
[TM][INFO] ------------------------- step = 85770 -------------------------
[TM][INFO] ------------------------- step = 85780 -------------------------
[TM][INFO] ------------------------- step = 85790 -------------------------
[TM][INFO] ------------------------- step = 85800 -------------------------
[TM][INFO] ------------------------- step = 85810 -------------------------
[TM][INFO] ------------------------- step = 85820 -------------------------
[TM][INFO] ------------------------- step = 85830 -------------------------
[TM][INFO] ------------------------- step = 85840 -------------------------
[TM][INFO] ------------------------- step = 85850 -------------------------
[TM][INFO] ------------------------- step = 85860 -------------------------
[TM][INFO] ------------------------- step = 85870 -------------------------
[TM][INFO] ------------------------- step = 85880 -------------------------
[TM][INFO] ------------------------- step = 85890 -------------------------
[TM][INFO] ------------------------- step = 85900 -------------------------
[TM][INFO] ------------------------- step = 85910 -------------------------
[TM][INFO] ------------------------- step = 85920 -------------------------
[TM][INFO] ------------------------- step = 85930 -------------------------
[TM][INFO] ------------------------- step = 85940 -------------------------
[TM][INFO] ------------------------- step = 85950 -------------------------
[TM][INFO] ------------------------- step = 85960 -------------------------
[TM][INFO] ------------------------- step = 85970 -------------------------
[TM][INFO] ------------------------- step = 85980 -------------------------
[TM][INFO] ------------------------- step = 85990 -------------------------
[TM][INFO] ------------------------- step = 86000 -------------------------
[TM][INFO] ------------------------- step = 86010 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 8
[TM][INFO] [forward] Request completed for 8
2024-11-05 10:05:38,600 - lmdeploy - INFO - UN-register stream callback for 8
lzhangzz commented 22 hours ago

Disscussion in https://github.com/InternLM/lmdeploy/issues/1740#issuecomment-2169249093 may be helpful for you.