OpenBMB / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
11 stars 5 forks source link

多张卡推理Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! #5

Open lys791227 opened 4 months ago

lys791227 commented 4 months ago

Your current environment

ile "/u02/liuys/MiniCPM-V/vllm/vllm/worker/model_runner.py", line 1185, in execute_model (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] hidden_or_intermediate_states = model_executable( (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return self._call_impl(*args, kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return forward_call(*args, *kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u02/liuys/MiniCPM-V/vllm/vllm/model_executor/models/minicpmv.py", line 583, in forward (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] vlm_embeddings, vision_hidden_states = self.get_embedding(inputs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u02/liuys/MiniCPM-V/vllm/vllm/model_executor/models/minicpmv.py", line 541, in get_embedding (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] vision_hidden_states = self.get_vision_hidden_states(data) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u02/liuys/MiniCPM-V/vllm/vllm/model_executor/models/minicpmv.py", line 519, in get_vision_hidden_states (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] vision_embedding = self.vpm(all_pixel_values.type(dtype), patch_attention_mask=patch_attn_mask).last_hidden_state (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return self._call_impl(args, kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return forward_call(*args, kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 715, in forward (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] hidden_states = self.embeddings(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return self._call_impl(*args, *kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return forward_call(args, kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/transformers/models/idefics2/modeling_idefics2.py", line 167, in forward (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] patch_embeds = self.patch_embedding(pixel_values) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return self._call_impl(*args, *kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return forward_call(args, **kwargs) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return self._conv_forward(input, self.weight, self.bias) (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] File "/u01/liuys/anaconda3/envs/minicpm-v-env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] return F.conv2d(input, weight, bias, self.stride, (VllmWorkerProcess pid=2398311) ERROR 07-22 13:28:17 multiproc_worker_utils.py:226] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.