40B仅用lora完成SFT后，推理时出现Expected all tensors to be on the same device

根据文档中的命令，完成了40B的lora微调训练，微调训练命令如下： GPUS=8 PER_DEVICE_BATCH_SIZE=2 sh shell/internvl2.0/2nd_finetune/internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora.sh

针对微调后的模型，模型可以成功加载，加载命令如下： model = AutoModel.from_pretrained( path, torch_dtype=torch.float16, low_cpu_mem_usage=True, trust_remote_code=True, device_map='auto').eval() tokenizer = AutoTokenizer.from_pretrained(path) 推理命令如下： generation_config = dict( num_beams=1, max_new_tokens=1024, do_sample=False, ) print("Single image check-----------------") pixel_values = load_image('/root/InternVL/examples/image3.jpg', max_num=6).to(torch.float16).cuda(3) question = 'In this image, cound you detect cars?' response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)

推理时出现如下报错： Traceback (most recent call last): File "/root/InternVL-main/inference_internvl2.py", line 203, in <module> response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True) File "/root/.cache/huggingface/modules/transformers_modules/internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora/modeling_internvl_chat.py", line 280, in chat generation_output = self.generate( File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora/modeling_internvl_chat.py", line 330, in generate outputs = self.language_model.generate( File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1479, in generate return self.greedy_search( File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2380, in greedy_search next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

补充说明：上述命令针对官方40B权重可以跑通。

OpenGVLab / InternVL

40B仅用lora完成SFT后，推理时出现Expected all tensors to be on the same device #685