OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
https://internvl.readthedocs.io/en/latest/
MIT License
6.19k stars 481 forks source link

40B仅用lora完成SFT后,推理时出现Expected all tensors to be on the same device #685

Open hahapt opened 1 month ago

hahapt commented 1 month ago

根据文档中的命令,完成了40B的lora微调训练,微调训练命令如下: GPUS=8 PER_DEVICE_BATCH_SIZE=2 sh shell/internvl2.0/2nd_finetune/internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora.sh

针对微调后的模型,模型可以成功加载,加载命令如下: model = AutoModel.from_pretrained( path, torch_dtype=torch.float16, low_cpu_mem_usage=True, trust_remote_code=True, device_map='auto').eval() tokenizer = AutoTokenizer.from_pretrained(path) 推理命令如下: generation_config = dict( num_beams=1, max_new_tokens=1024, do_sample=False, ) print("Single image check-----------------") pixel_values = load_image('/root/InternVL/examples/image3.jpg', max_num=6).to(torch.float16).cuda(3) question = 'In this image, cound you detect cars?' response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)

推理时出现如下报错: Traceback (most recent call last): File "/root/InternVL-main/inference_internvl2.py", line 203, in <module> response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True) File "/root/.cache/huggingface/modules/transformers_modules/internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora/modeling_internvl_chat.py", line 280, in chat generation_output = self.generate( File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/internvl2_40b_hermes2_yi_34b_dynamic_res_2nd_finetune_lora/modeling_internvl_chat.py", line 330, in generate outputs = self.language_model.generate( File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1479, in generate return self.greedy_search( File "/root/anaconda3/envs/pllava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2380, in greedy_search next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0!

补充说明:上述命令针对官方40B权重可以跑通。

FAFUuser commented 1 day ago

40B模型运行需要多大显存?