Open 520jefferson opened 3 months ago
max_model_len, tp_size = 100000, 8
vllm输入长度设置为100k报错了:(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO Connected all rings
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO Connected all trees
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
(RayWorkerWrapper pid=1619) eflops105:1619:3286 [1] NCCL INFO comm 0x55cab4bf7920 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0xf51f9b2cdcc8a44a - Init COMPLETE
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] Traceback (most recent call last):
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 137, in execute_method
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return executor(*args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return func(*args, *kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] self.model_runner.profile_run()
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return func(args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 890, in profile_run
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] self.execute_model(seqs, kv_caches)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return func(args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 810, in execute_model
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] hidden_states = model_executable(execute_model_kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(*args, *kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 495, in forward
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] hidden_states = self.model(input_ids, positions, kv_caches,
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(*args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 466, in forward
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] hidden_states, residual = layer(positions, hidden_states,
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(*args, *kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 403, in forward
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] hidden_states = self.self_attn(
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(*args, kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(*args, *kwargs)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 318, in forward
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] q[:, :, :self.kv_lora_rank] = q_c.permute(1, 0, 2)
(RayWorkerWrapper pid=1817) ERROR 06-15 16:40:54 worker_base.py:145] RuntimeError: The expanded size of the tensor (65536) must match the existing size (100000) at non-singleton dimension 0. Target sizes: [65536, 16, 512]. Tensor sizes: [100000, 16, 512]
ERROR 06-15 16:40:54 worker_base.py:145] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
ERROR 06-15 16:40:54 worker_base.py:145] Traceback (most recent call last):
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 137, in execute_method
ERROR 06-15 16:40:54 worker_base.py:145] return executor(args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 06-15 16:40:54 worker_base.py:145] return func(*args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 139, in determine_num_available_blocks
ERROR 06-15 16:40:54 worker_base.py:145] self.model_runner.profile_run()
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 06-15 16:40:54 worker_base.py:145] return func(*args, *kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 890, in profile_run
ERROR 06-15 16:40:54 worker_base.py:145] self.execute_model(seqs, kv_caches)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 06-15 16:40:54 worker_base.py:145] return func(args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 810, in execute_model
ERROR 06-15 16:40:54 worker_base.py:145] hidden_states = model_executable(execute_model_kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(*args, *kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 495, in forward
ERROR 06-15 16:40:54 worker_base.py:145] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(*args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(*args, *kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 466, in forward
ERROR 06-15 16:40:54 worker_base.py:145] hidden_states, residual = layer(positions, hidden_states,
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(*args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 403, in forward
ERROR 06-15 16:40:54 worker_base.py:145] hidden_states = self.self_attn(
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return self._call_impl(*args, *kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
ERROR 06-15 16:40:54 worker_base.py:145] return forward_call(args, kwargs)
ERROR 06-15 16:40:54 worker_base.py:145] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 318, in forward
ERROR 06-15 16:40:54 worker_base.py:145] q[:, :, :self.kv_lora_rank] = q_c.permute(1, 0, 2)
ERROR 06-15 16:40:54 worker_base.py:145] RuntimeError: The expanded size of the tensor (65536) must match the existing size (100000) at non-singleton dimension 0. Target sizes: [65536, 16, 512]. Tensor sizes: [100000, 16, 512]
Traceback (most recent call last):
File "/maindata/data/user/lijie.wang/roleplay/train/deepseek/gen_result/deepseek_v2_chat_testset.py", line 9, in
@520jefferson 请问您max_model_len最长能设置为多大呢?
如何通过32k扩展到128k,有例子或者脚本吗? 8卡 a100 80G*8能推理128k吗?