OpenBMB / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
9 stars 3 forks source link

[Usage]: MiniCPM-Llama3-V 2.5 使用vllm 报错:AttributeError: 'list' object has no attribute 'to' #6

Closed renjingneng closed 1 month ago

renjingneng commented 1 month ago

Your current environment

我是 MiniCPM-Llama3-V 2.5 都不能用 运行minicpmv_example.py 报错:AttributeError: 'list' object has no attribute 'to'

环境:

Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.1
Libc version: glibc-2.35

Python version: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-112-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.1.66
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA L40
GPU 1: NVIDIA L40
GPU 2: NVIDIA L40
GPU 3: NVIDIA L40

Nvidia driver version: 550.54.15
cuDNN version: Probably one of the following:
/root/anaconda3/envs/yolov8/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn.so.8
/root/anaconda3/envs/yolov8/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_cnn_infer.so.8
/root/anaconda3/envs/yolov8/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_cnn_train.so.8
/root/anaconda3/envs/yolov8/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8
/root/anaconda3/envs/yolov8/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      45 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             40
On-line CPU(s) list:                0-39
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz
CPU family:                         6
Model:                              106
Thread(s) per core:                 1
Core(s) per socket:                 20
Socket(s):                          2
Stepping:                           6
BogoMIPS:                           4589.21
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm md_clear flush_l1d arch_capabilities
Hypervisor vendor:                  VMware
Virtualization type:                full
L1d cache:                          1.9 MiB (40 instances)
L1i cache:                          1.3 MiB (40 instances)
L2 cache:                           50 MiB (40 instances)
L3 cache:                           60 MiB (2 instances)
NUMA node(s):                       2
NUMA node0 CPU(s):                  0-19
NUMA node1 CPU(s):                  20-39
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        KVM: Mitigation: VMX unsupported
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Syscall hardening, KVM SW loop
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] torch==2.3.1
[pip3] torchvision==0.18.1
[pip3] transformers==4.42.4
[pip3] triton==2.3.1
[conda] numpy                     1.24.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] torch                     2.3.1                    pypi_0    pypi
[conda] torchvision               0.18.1                   pypi_0    pypi
[conda] transformers              4.42.4                   pypi_0    pypi
[conda] triton                    2.3.1                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.2
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PHB     PHB     PHB     0-39    0-1             N/A
GPU1    PHB      X      PHB     PHB     0-39    0-1             N/A
GPU2    PHB     PHB      X      PHB     0-39    0-1             N/A
GPU3    PHB     PHB     PHB      X      0-39    0-1             N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

How would you like to use vllm

I want to run inference of a [ MiniCPM-Llama3-V 2.5].

renjingneng commented 1 month ago

下面是完整报错


AttributeError Traceback (most recent call last) Cell In[1], line 11 6 os.environ['CUDA_VISIBLE_DEVICES'] = '3' 8 MODEL_NAME = "/aixunlian/renjingneng/tasks/task_6/model/openbmb-MiniCPM-Llama3-V-2_5" ---> 11 llm = LLM(model=MODEL_NAME, 12 gpu_memory_utilization=1, 13 trust_remote_code=True, 14 max_model_len=4096) 15 exit() 16 IMAGES = [ 17 "/aixunlian/renjingneng/tasks/task_6/MiniCPM-V/assets/airplane.jpeg", 18 ]

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/entrypoints/llm.py:156, in LLM.init(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_context_len_to_capture, max_seq_len_to_capture, disable_custom_all_reduce, kwargs) 133 raise TypeError( 134 "There is no need to pass vision-related arguments anymore.") 135 engine_args = EngineArgs( 136 model=model, 137 tokenizer=tokenizer, (...) 154 kwargs, 155 ) --> 156 self.llm_engine = LLMEngine.from_engine_args( 157 engine_args, usage_context=UsageContext.LLM_CLASS) 158 self.request_counter = Counter()

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/engine/llm_engine.py:426, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers) 424 executor_class = GPUExecutor 425 # Create the LLM engine. --> 426 engine = cls( 427 **engine_config.to_dict(), 428 executor_class=executor_class, 429 log_stats=not engine_args.disable_log_stats, 430 usage_context=usage_context, 431 stat_loggers=stat_loggers, 432 ) 434 return engine

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/engine/llm_engine.py:264, in LLMEngine.init(self, model_config, cache_config, parallel_config, scheduler_config, device_config, load_config, lora_config, multimodal_config, speculative_config, decoding_config, observability_config, prompt_adapter_config, executor_class, log_stats, usage_context, stat_loggers) 250 self.model_executor = executor_class( 251 model_config=model_config, 252 cache_config=cache_config, (...) 260 prompt_adapter_config=prompt_adapter_config, 261 ) 263 if not self.model_config.embedding_mode: --> 264 self._initialize_kv_caches() 266 # If usage stat is enabled, collect relevant info. 267 if is_usage_stats_enabled():

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/engine/llm_engine.py:363, in LLMEngine._initialize_kv_caches(self) 356 def _initialize_kv_caches(self) -> None: 357 """Initialize the KV cache in the worker(s). 358 359 The workers will determine the number of blocks in both the GPU cache 360 and the swap CPU cache. 361 """ 362 num_gpu_blocks, num_cpu_blocks = ( --> 363 self.model_executor.determine_num_available_blocks()) 365 if self.cache_config.num_gpu_blocks_override is not None: 366 num_gpu_blocks_override = self.cache_config.num_gpu_blocks_override

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/executor/gpu_executor.py:92, in GPUExecutor.determine_num_available_blocks(self) 88 def determine_num_available_blocks(self) -> Tuple[int, int]: 89 """Determine the number of available KV blocks by invoking the 90 underlying worker. 91 """ ---> 92 return self.driver_worker.determine_num_available_blocks()

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/worker/worker.py:179, in Worker.determine_num_available_blocks(self) 175 torch.cuda.empty_cache() 177 # Execute a forward pass with dummy inputs to profile the memory usage 178 # of the model. --> 179 self.model_runner.profile_run() 181 # Calculate the number of blocks that can be allocated with the 182 # profiled peak memory. 183 torch.cuda.synchronize()

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/worker/model_runner.py:759, in GPUModelRunnerBase.profile_run(self) 757 kv_caches = [None] * num_layers 758 finished_requests_ids = [seq.request_id for seq in seqs] --> 759 model_input = self.prepare_model_input( 760 seqs, finished_requests_ids=finished_requests_ids) 761 intermediate_tensors = None 762 if not get_pp_group().is_first_rank:

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/worker/model_runner.py:1096, in ModelRunner.prepare_model_input(self, seq_group_metadata_list, virtual_engine, finished_requests_ids) 1077 def prepare_model_input( 1078 self, 1079 seq_group_metadata_list: List[SequenceGroupMetadata], 1080 virtual_engine: int = 0, 1081 finished_requests_ids: Optional[List[str]] = None 1082 ) -> ModelInputForGPUWithSamplingMetadata: 1083 """Prepare the model input based on a given sequence group, including 1084 metadata for the sampling step. 1085 (...) 1094 If cuda graph is required, this API automatically pads inputs. 1095 """ -> 1096 model_input = self._prepare_model_input_tensors( 1097 seq_group_metadata_list, finished_requests_ids) 1098 sampling_metadata = SamplingMetadata.prepare(seq_group_metadata_list, 1099 model_input.seq_lens, 1100 model_input.query_lens, 1101 self.device, 1102 self.pin_memory) 1103 is_prompt = (seq_group_metadata_list[0].is_prompt 1104 if seq_group_metadata_list else None)

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/worker/model_runner.py:672, in GPUModelRunnerBase._prepare_model_input_tensors(self, seq_group_metadata_list, finished_requests_ids) 670 for seq_group_metadata in seq_group_metadata_list: 671 builder.add_seq_group(seq_group_metadata) --> 672 return builder.build()

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/worker/model_runner.py:444, in ModelInputForGPUBuilder.build(self) 441 prompt_adapter_mapping = None 443 # Multi-modal data. --> 444 multi_modal_kwargs = MultiModalInputs.batch( 445 self.multi_modal_inputs_list, device=self.runner.device) 447 return self.model_input_cls( 448 input_tokens=input_tokens_tensor, 449 input_positions=input_positions_tensor, (...) 458 prompt_adapter_mapping=prompt_adapter_mapping, 459 prompt_adapter_requests=self.prompt_adapter_requests)

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/multimodal/base.py:87, in MultiModalInputs.batch(inputs_list, device) 84 for k, v in inputs.items(): 85 item_lists[k].append(v) ---> 87 return { 88 k: MultiModalInputs.try_concat(item_list, device=device) 89 for k, item_list in item_lists.items() 90 }

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/multimodal/base.py:88, in (.0) 84 for k, v in inputs.items(): 85 item_lists[k].append(v) 87 return { ---> 88 k: MultiModalInputs.try_concat(item_list, device=device) 89 for k, item_list in item_lists.items() 90 }

File ~/anaconda3/envs/MiniCPMV/lib/python3.10/site-packages/vllm/multimodal/base.py:54, in MultiModalInputs.try_concat(tensors, device) 52 for new_tensor in tensors: 53 for new_t in new_tensor: ---> 54 new_tensors.append(new_t.to(device)) 55 return new_tensors 56 unbatched_shape = tensors[0].shape[1:]

AttributeError: 'list' object has no attribute 'to'

XiangyuWu commented 1 month ago

same error

darrenzhang1007 commented 1 month ago

same error image

HwwwwwwwH commented 1 month ago

These are things I've discussed with vllm teams yesterday and we've got our PR merged into main branch of vllm official repo. For now maybe you can just try using the official code. And I'll update the main branch and delete minicpmv branch.

renjingneng commented 1 month ago

These are things I've discussed with vllm teams yesterday and we've got our PR merged into main branch of vllm official repo. For now maybe you can just try using the official code. And I'll update the main branch and delete minicpmv branch.

感谢大佬