InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.76k stars 433 forks source link

[Bug] 多卡部署InternVL2-8B报错Aborted (core dumped) #2408

Closed gxlover0625 closed 3 months ago

gxlover0625 commented 3 months ago

Checklist

Describe the bug

问题

我使用piplineInternVL2-8B进行部署,总共有4卡V100

相关issue

我查阅了相关的issue,发现最接近的是https://github.com/InternLM/lmdeploy/issues/2250#issue-2452254301 ,但是这个issue最终并没有解决。参考issue的做法,我设置了TM_DEBUG_LEVEL= DEBUG以及初始化pipline设置log_level=INFO,我将最终的运行结果会放在Error traceback小节。我怀疑过是NCCL的问题,但是并没有出现NCCL的Error。

Conda环境

absl-py 2.1.0 accelerate 0.33.0 addict 2.4.0 aiofiles 24.1.0 aiohttp 3.9.5 aiosignal 1.3.1 altair 5.3.0 annotated-types 0.7.0 anyio 4.4.0 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 backcall 0.2.0 beautifulsoup4 4.12.3 bitsandbytes 0.41.0 blinker 1.8.2 cachetools 5.4.0 certifi 2024.7.4 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 comm 0.2.2 contourpy 1.2.1 cycler 0.12.1 datasets 2.20.0 debugpy 1.6.7 decorator 5.1.1 decord 0.6.0 deepspeed 0.13.5 dill 0.3.8 distro 1.9.0 dnspython 2.6.1 einops 0.8.0 einops-exts 0.0.4 email_validator 2.2.0 entrypoints 0.4 et-xmlfile 1.1.0 exceptiongroup 1.2.2 executing 2.0.1 fastapi 0.111.1 fastapi-cli 0.0.4 ffmpy 0.3.3 filelock 3.15.4 fire 0.6.0 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.5.0 future 1.0.0 gdown 5.2.0 gitdb 4.0.11 GitPython 3.1.43 gradio 3.35.2 gradio_client 0.2.9 grpcio 1.65.1 h11 0.14.0 hjson 3.1.0 httpcore 0.17.3 httptools 0.6.1 httpx 0.24.0 huggingface-hub 0.24.3 idna 3.7 imageio 2.34.2 importlib_metadata 8.2.0 importlib_resources 6.4.0 ipykernel 6.29.5 ipython 8.12.0 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 jupyter-client 7.3.4 jupyter_core 5.7.2 kiwisolver 1.4.5 latex2mathml 3.77.0 linkify-it-py 2.0.3 llava 1.7.0.dev0 lmdeploy 0.5.2.post1 Markdown 3.6 markdown-it-py 2.2.0 markdown2 2.5.0 MarkupSafe 2.1.5 matplotlib 3.9.1 matplotlib-inline 0.1.7 mdit-py-plugins 0.3.3 mdurl 0.1.2 mmcls 0.25.0 mmcv-full 1.6.2 mmengine-lite 0.10.4 mmsegmentation 0.30.0 model-index 0.1.11 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.16 nest_asyncio 1.6.0 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 openai 1.37.1 opencv-python 4.10.0.84 opendatalab 0.0.10 openmim 0.3.9 openpyxl 3.1.5 openxlab 0.0.11 ordered-set 4.1.0 orjson 3.10.6 packaging 24.1 pandas 2.2.2 parso 0.8.4 peft 0.11.1 pexpect 4.9.0 pickleshare 0.7.5 Pillow 9.5.0 pip 24.0 platformdirs 4.2.2 prettytable 3.10.2 prompt_toolkit 3.0.47 protobuf 4.25.4 psutil 5.9.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pycocoevalcap 1.2 pycocotools 2.0.8 pycryptodome 3.20.0 pydantic 2.8.2 pydantic_core 2.20.1 pydeck 0.9.1 pydub 0.25.1 Pygments 2.18.0 pynvml 11.5.3 pyodps 0.11.6.2 pyparsing 3.1.2 PySocks 1.7.1 python-dateutil 2.9.0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 pyzmq 25.1.2 referencing 0.35.1 regex 2024.7.24 requests 2.32.3 rich 13.7.1 rpds-py 0.19.1 safetensors 0.4.3 scikit-learn 1.5.1 scipy 1.13.1 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 69.5.1 shellingham 1.5.4 shortuuid 1.0.13 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 soupsieve 2.5 stack-data 0.6.2 starlette 0.37.2 streamlit 1.37.0 streamlit-image-select 0.6.0 svgwrite 1.4.3 sympy 1.13.1 tabulate 0.9.0 tenacity 8.5.0 tensorboard 2.17.0 tensorboard-data-server 0.7.2 tensorboardX 2.6.2.2 termcolor 2.4.0 threadpoolctl 3.5.0 tiktoken 0.7.0 timm 0.9.12 tokenizers 0.19.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.1 torch 2.2.2 torchvision 0.17.2 tornado 6.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.43.3 transformers-stream-generator 0.0.5 triton 2.2.0 typer 0.12.3 typing_extensions 4.12.2 tzdata 2024.1 uc-micro-py 1.0.3 urllib3 2.2.2 uvicorn 0.30.3 uvloop 0.19.0 watchdog 4.0.1 watchfiles 0.22.0 wavedrom 2.0.3.post3 wcwidth 0.2.13 websockets 12.0 Werkzeug 3.0.3 wheel 0.43.0 xxhash 3.4.1 yacs 0.1.8 yapf 0.40.1 yarl 1.9.4 zipp 3.19.2

Reproduction

完整代码

import torch

from lmdeploy import pipeline, TurbomindEngineConfig, ChatTemplateConfig, GenerationConfig
from lmdeploy.vl import load_image
from lmdeploy.vl.constants import IMAGE_TOKEN
nest_asyncio.apply()

model_path = "/home/admin/workspace/aop_lab/llm/OpenGVLab/InternVL2-8B"
system_prompt = '我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。'
chat_template_config = ChatTemplateConfig('internvl-internlm2')
chat_template_config.meta_instruction = system_prompt
gen_config = GenerationConfig(max_new_tokens=512)

model = pipeline(model_path, chat_template_config=chat_template_config, backend_config=TurbomindEngineConfig(tp=torch.cuda.device_count(), session_len=8192, cache_max_entry_count=0.8), log_level='INFO')
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = model(('describe this image', image))
print(response.text)

Environment

sys.platform: linux
Python: 3.9.19 (main, May  6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: Tesla V100-SXM2-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (GCC) 10.2.1 20200825 (Alibaba 10.2.1-3 2.17)
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.17.2+cu121
LMDeploy: 0.5.2.post1+
transformers: 4.43.3
gradio: 3.35.2
fastapi: 0.111.1
pydantic: 2.8.2
triton: 2.2.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV2     NV1     NV2     0-47    0               N/A
GPU1    NV2      X      NV2     NV1     0-47    0               N/A
GPU2    NV1     NV2      X      NV1     0-47    0               N/A
GPU3    NV2     NV1     NV1      X      0-47    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

# 这是不开启NCCL的console输出
2024-09-01 18:23:09,551 - lmdeploy - INFO - Using turbomind engine
2024-09-01 18:23:10,179 - lmdeploy - INFO - matching vision model: InternVLVisionModel
FlashAttention is not installed.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash attention is not available, using eager attention instead.
2024-09-01 18:23:12,995 - lmdeploy - INFO - using InternVL-Chat-V1-5 vision preprocess                                  
2024-09-01 18:23:12,996 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=4, session_len=8192, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-09-01 18:23:12,996 - lmdeploy - INFO - input chat_template_config=ChatTemplateConfig(model_name='internvl-internlm2', system=None, meta_instruction='我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。', eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-01 18:23:13,002 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='internvl-internlm2', system=None, meta_instruction='我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。', eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-01 18:23:13,003 - lmdeploy - INFO - model_source: hf_model
Device does not support bfloat16. Set float16 forcefully
2024-09-01 18:23:13,906 - lmdeploy - INFO - model_config:

[llama]
model_name = internvl2-internlm2
model_arch = InternVLChatModel
tensor_para_size = 4
head_num = 32
kv_head_num = 8
vocab_size = 92553
num_layer = 32
inter_size = 14336
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 8192
weight_type = fp16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.8
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
original_max_position_embeddings = 0
rope_scaling_type = 
rope_scaling_factor = 2.0
use_dynamic_ntk = 1
low_freq_factor = 1.0
high_freq_factor = 1.0
use_logn_attn = 0
lora_policy = 
lora_r = 0
lora_scale = 0.0
lora_max_wo_r = 0
lora_rank_pattern = 
lora_scale_pattern = 

[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 8192.
[TM][WARNING] pad vocab size from 92553 to 92556
[TM][WARNING] pad vocab size from 92553 to 92556
[TM][WARNING] pad vocab size from 92553 to 92556
[TM][WARNING] pad vocab size from 92553 to 92556
2024-09-01 18:23:15,582 - lmdeploy - WARNING - get 707 model params
2024-09-01 18:23:21,288 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=4, session_len=8192, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] NCCL group_id = 0
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] LlamaBatch<T>::Start()
2024-09-01 18:23:22,382 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-09-01 18:23:22,382 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-09-01 18:23:22,382 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-09-01 18:23:22,904 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 0.522s
2024-09-01 18:23:22,905 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-09-01 18:23:22,906 - lmdeploy - INFO - prompt='<|im_start|>system\n我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。<|im_end|>\n<|im_start|>user\n<img><IMAGE_TOKEN></img>\ndescribe this image<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=13761048767787600089, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 68734, 60628, 60384, 60721, 60775, 60978, 60353, 79448, 60357, 1214, 1070, 30924, 60353, 69643, 68589, 76659, 71581, 60359, 77859, 60543, 75438, 68558, 68542, 69504, 68640, 71434, 60838, 60921, 60368, 68790, 70218, 60355, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 12483, 550, 2321, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-09-01 18:23:22,906 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1845, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True
2024-09-01 18:23:22,906 - lmdeploy - INFO - Register stream callback for 0
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1845, max_q = 1845, max_k = 1845
Aborted (core dumped)

# 这是export NCCL_DEBUG=INFO对console输出结果
2024-09-01 18:28:21,551 - lmdeploy - INFO - Using turbomind engine
2024-09-01 18:28:22,282 - lmdeploy - INFO - matching vision model: InternVLVisionModel
FlashAttention is not installed.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash Attention is not available, use_flash_attn is set to False.
Warning: Flash attention is not available, using eager attention instead.
2024-09-01 18:28:25,184 - lmdeploy - INFO - using InternVL-Chat-V1-5 vision preprocess                                                               
2024-09-01 18:28:25,185 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=4, session_len=8192, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-09-01 18:28:25,185 - lmdeploy - INFO - input chat_template_config=ChatTemplateConfig(model_name='internvl-internlm2', system=None, meta_instruction='我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。', eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-01 18:28:25,192 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='internvl-internlm2', system=None, meta_instruction='我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。', eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-01 18:28:25,192 - lmdeploy - INFO - model_source: hf_model
Device does not support bfloat16. Set float16 forcefully
2024-09-01 18:28:26,128 - lmdeploy - INFO - model_config:

[llama]
model_name = internvl2-internlm2
model_arch = InternVLChatModel
tensor_para_size = 4
head_num = 32
kv_head_num = 8
vocab_size = 92553
num_layer = 32
inter_size = 14336
norm_eps = 1e-05
attn_bias = 0
start_id = 1
end_id = 2
session_len = 8192
weight_type = fp16
rotary_embedding = 128
rope_theta = 1000000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.8
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 1
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 32768
original_max_position_embeddings = 0
rope_scaling_type = 
rope_scaling_factor = 2.0
use_dynamic_ntk = 1
low_freq_factor = 1.0
high_freq_factor = 1.0
use_logn_attn = 0
lora_policy = 
lora_r = 0
lora_scale = 0.0
lora_max_wo_r = 0
lora_rank_pattern = 
lora_scale_pattern = 

[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 8192.
serverless-033076212002:138828:138828 [0] NCCL INFO Bootstrap : Using eth0:33.76.212.2<0>
serverless-033076212002:138828:138828 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
serverless-033076212002:138828:138828 [0] NCCL INFO cudaDriverVersion 12020
NCCL version 2.19.3+cuda12.3
serverless-033076212002:138828:139188 [0] NCCL INFO Failed to open libibverbs.so[.1]
serverless-033076212002:138828:139188 [0] NCCL INFO NET/Socket : Using [0]eth0:33.76.212.2<0>
serverless-033076212002:138828:139188 [0] NCCL INFO Using non-device net plugin version 0
serverless-033076212002:138828:139188 [0] NCCL INFO Using network Socket
serverless-033076212002:138828:139191 [3] NCCL INFO Using non-device net plugin version 0
serverless-033076212002:138828:139191 [3] NCCL INFO Using network Socket
serverless-033076212002:138828:139189 [1] NCCL INFO Using non-device net plugin version 0
serverless-033076212002:138828:139189 [1] NCCL INFO Using network Socket
serverless-033076212002:138828:139190 [2] NCCL INFO Using non-device net plugin version 0
serverless-033076212002:138828:139190 [2] NCCL INFO Using network Socket
serverless-033076212002:138828:139188 [0] NCCL INFO comm 0xf386e40 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 80 commId 0xeaf16eb88e2b9911 - Init START
serverless-033076212002:138828:139189 [1] NCCL INFO comm 0xf2000a0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 90 commId 0xeaf16eb88e2b9911 - Init START
serverless-033076212002:138828:139191 [3] NCCL INFO comm 0xd6c2c10 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId b0 commId 0xeaf16eb88e2b9911 - Init START
serverless-033076212002:138828:139190 [2] NCCL INFO comm 0xf561080 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId a0 commId 0xeaf16eb88e2b9911 - Init START
serverless-033076212002:138828:139188 [0] NCCL INFO NVLS multicast support is not available on dev 0
serverless-033076212002:138828:139190 [2] NCCL INFO NVLS multicast support is not available on dev 2
serverless-033076212002:138828:139189 [1] NCCL INFO NVLS multicast support is not available on dev 1
serverless-033076212002:138828:139191 [3] NCCL INFO NVLS multicast support is not available on dev 3
serverless-033076212002:138828:139191 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] 2/-1/-1->3->1 [2] 1/-1/-1->3->0 [3] -1/-1/-1->3->0 [4] -1/-1/-1->3->2 [5] 2/-1/-1->3->1 [6] 1/-1/-1->3->0 [7] -1/-1/-1->3->0
serverless-033076212002:138828:139189 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 3/-1/-1->1->0 [2] 2/-1/-1->1->3 [3] 0/-1/-1->1->2 [4] 2/-1/-1->1->0 [5] 3/-1/-1->1->0 [6] 2/-1/-1->1->3 [7] 0/-1/-1->1->2
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 00/08 :    0   1   2   3
serverless-033076212002:138828:139189 [1] NCCL INFO P2P Chunksize set to 524288
serverless-033076212002:138828:139191 [3] NCCL INFO P2P Chunksize set to 524288
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 01/08 :    0   3   1   2
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 02/08 :    0   3   2   1
serverless-033076212002:138828:139190 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] -1/-1/-1->2->3 [2] -1/-1/-1->2->1 [3] 1/-1/-1->2->-1 [4] 3/-1/-1->2->1 [5] -1/-1/-1->2->3 [6] -1/-1/-1->2->1 [7] 1/-1/-1->2->-1
serverless-033076212002:138828:139190 [2] NCCL INFO P2P Chunksize set to 524288
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 03/08 :    0   2   1   3
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 04/08 :    0   1   2   3
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 05/08 :    0   3   1   2
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 06/08 :    0   3   2   1
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 07/08 :    0   2   1   3
serverless-033076212002:138828:139188 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 3/-1/-1->0->-1 [3] 3/-1/-1->0->1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 3/-1/-1->0->-1 [7] 3/-1/-1->0->1
serverless-033076212002:138828:139188 [0] NCCL INFO P2P Chunksize set to 524288
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 00/0 : 3[3] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 03/0 : 0[0] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 03/0 : 1[1] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 01/0 : 3[3] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 07/0 : 1[1] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 01/0 : 2[2] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 07/0 : 0[0] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 05/0 : 2[2] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 02/0 : 0[0] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 06/0 : 0[0] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Connected all rings
serverless-033076212002:138828:139188 [0] NCCL INFO Connected all rings
serverless-033076212002:138828:139189 [1] NCCL INFO Connected all rings
serverless-033076212002:138828:139191 [3] NCCL INFO Connected all rings
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 02/0 : 3[3] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 01/0 : 1[1] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 02/0 : 1[1] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 05/0 : 1[1] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139190 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 06/0 : 1[1] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 03/0 : 0[0] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139191 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139189 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Channel 07/0 : 0[0] -> 3[3] via P2P/direct pointer
serverless-033076212002:138828:139188 [0] NCCL INFO Connected all trees
serverless-033076212002:138828:139188 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
serverless-033076212002:138828:139188 [0] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer
serverless-033076212002:138828:139189 [1] NCCL INFO Connected all trees
serverless-033076212002:138828:139189 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
serverless-033076212002:138828:139189 [1] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer
serverless-033076212002:138828:139191 [3] NCCL INFO Connected all trees
serverless-033076212002:138828:139190 [2] NCCL INFO Connected all trees
serverless-033076212002:138828:139191 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
serverless-033076212002:138828:139190 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512
serverless-033076212002:138828:139190 [2] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer
serverless-033076212002:138828:139191 [3] NCCL INFO 8 coll channels, 0 nvls channels, 8 p2p channels, 2 p2p channels per peer
serverless-033076212002:138828:139191 [3] NCCL INFO comm 0xd6c2c10 rank 3 nranks 4 cudaDev 3 nvmlDev 3 busId b0 commId 0xeaf16eb88e2b9911 - Init COMPLETE
serverless-033076212002:138828:139188 [0] NCCL INFO comm 0xf386e40 rank 0 nranks 4 cudaDev 0 nvmlDev 0 busId 80 commId 0xeaf16eb88e2b9911 - Init COMPLETE
serverless-033076212002:138828:139189 [1] NCCL INFO comm 0xf2000a0 rank 1 nranks 4 cudaDev 1 nvmlDev 1 busId 90 commId 0xeaf16eb88e2b9911 - Init COMPLETE
serverless-033076212002:138828:139190 [2] NCCL INFO comm 0xf561080 rank 2 nranks 4 cudaDev 2 nvmlDev 2 busId a0 commId 0xeaf16eb88e2b9911 - Init COMPLETE
[TM][WARNING] pad vocab size from 92553 to 92556
[TM][WARNING] pad vocab size from 92553 to 92556
[TM][WARNING] pad vocab size from 92553 to 92556
[TM][WARNING] pad vocab size from 92553 to 92556
2024-09-01 18:28:27,830 - lmdeploy - WARNING - get 707 model params
2024-09-01 18:28:33,551 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=4, session_len=8192, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] NCCL group_id = 0
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] [BlockManager] block_size = 2 MB
[TM][INFO] [BlockManager] max_block_count = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] [BlockManager] chunk_size = 10683
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] LlamaBatch<T>::Start()
[TM][INFO] LlamaBatch<T>::Start()
2024-09-01 18:28:34,619 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-09-01 18:28:34,620 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-09-01 18:28:34,620 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-09-01 18:28:35,133 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 0.514s
2024-09-01 18:28:35,134 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-09-01 18:28:35,135 - lmdeploy - INFO - prompt='<|im_start|>system\n我是书生·万象,英文名是InternVL,是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。<|im_end|>\n<|im_start|>user\n<img><IMAGE_TOKEN></img>\ndescribe this image<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=13918566462064088141, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 68734, 60628, 60384, 60721, 60775, 60978, 60353, 79448, 60357, 1214, 1070, 30924, 60353, 69643, 68589, 76659, 71581, 60359, 77859, 60543, 75438, 68558, 68542, 69504, 68640, 71434, 60838, 60921, 60368, 68790, 70218, 60355, 92542, 364, 92543, 1008, 364, 92544, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 92545, 364, 12483, 550, 2321, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.
2024-09-01 18:28:35,136 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1845, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True
2024-09-01 18:28:35,136 - lmdeploy - INFO - Register stream callback for 0
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 0 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 1845, max_q = 1845, max_k = 1845
Aborted (core dumped)
gxlover0625 commented 3 months ago

我通过pip install -U lmdeploy解决了这个问题,lmdeploy的版本是0.5.3,我唯一没做的事情就是checklist的第二个task更新到最新版代码哈哈,以后要多多注意开发者善意的提醒。