Why can not the code inference with multi-gpu?

JinhuiYE commented 1 month ago

I can infer with single gpu but fail while CUDA_VISIBLE_DEVICES=0,1

I have 8*A800 and try to run lava-onevision-qwen2-72b-ov model, here is a bug:

''' LLaVA-NeXT/llava/model/llava_arch.py", line 363, in prepare_inputs_labels_for_multimodal image_feature = torch.cat((image_feature, self.model.image_newline[None]), dim=0) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat) '''

SiniShell1 commented 1 month ago

I just faced the same problem as you and solved it by adding device_map=accelerator.device instead of the default value auto while loading the model: lm_tokenizer, lm_model, lm_image_processor, lm_max_length = load_pretrained_model("lmms-lab/llava-onevision-qwen2-0.5b-si", None, "llava_qwen", device_map=accelerator.device)

JinhuiYE commented 1 month ago

Yes, I had fixed it by replaying the line with "image_feature = torch.cat((image_feature, self.model.image_newline[None].to(image_feature.device), dim=0)"

shalini-maiti commented 2 weeks ago

.to(image_feature.device)

Hello, how do you initialize the variable accelerator?

LLaVA-VL / LLaVA-NeXT

Why can not the code inference with multi-gpu? #154