OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
https://internvl.github.io/
MIT License
3.98k stars 304 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! #229

Open Qinger27 opened 1 month ago

Qinger27 commented 1 month ago

模型加载使用model = AutoModel.from_pretrained( path, torch_dtype=torch.float16, low_cpu_mem_usage=True, trust_remote_code=True, device_map='auto').eval() 想问下tensor应该怎么处理呀?不管是把数据放不到cuda上都会报错,RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 请问有人可以帮忙看一下吗?

czczup commented 1 month ago

这个bug很多人都遇到了,我暂时没想到比较好的方法能够适配所有情况,有个方案可以解决,需要手工为模型分配设备。

比如这个模型总共26B,2张卡最理想的情况是每张卡13B。因此,除去ViT的6B以外,卡0还需要放7B,也就是20B的LLM有1/3在卡0,有2/3在卡1。

写成代码就是:

device_map = {
    'vision_model': 0,
    'mlp1': 0,
    'language_model.model.tok_embeddings': 0,
    'language_model.model.norm': 1,
    'language_model.output.weight': 1
}
for i in range(16):
    device_map[f'language_model.model.layers.{i}'] = 0
for i in range(16, 48):
    device_map[f'language_model.model.layers.{i}'] = 1
print(device_map)
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map=device_map
).eval()
czczup commented 1 month ago

Many people have encountered this bug, and I haven't yet found a good solution that fits all situations. There is one workaround that involves manually assigning devices to the model.

For instance, if the model has a total of 26B parameters, ideally, each of the 2 GPUs should handle 13B. Therefore, excluding the 6B from ViT, GPU 0 should still handle 7B. This means 1/3 of the 20B LLM should be on GPU 0, and 2/3 on GPU 1.

Here is how you can write it in code:

device_map = {
    'vision_model': 0,
    'mlp1': 0,
    'language_model.model.tok_embeddings': 0,
    'language_model.model.norm': 1,
    'language_model.output.weight': 1
}
for i in range(16):
    device_map[f'language_model.model.layers.{i}'] = 0
for i in range(16, 48):
    device_map[f'language_model.model.layers.{i}'] = 1
print(device_map)
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map=device_map
).eval()
LIN-SHANG commented 4 days ago

我刚刚试了一下还是会报错,可以帮忙看一下吗? (internvl_a100) [hadoop-aipnlp@set-zw04-kubernetes-pc206 internvl_chat]$ python multiGPU_cli.py {'vision_model': 0, 'mlp1': 0, 'language_model.model.tok_embeddings': 0, 'language_model.model.norm': 1, 'language_model.output.weight': 1, 'language_model.model.layers.0': 0, 'language_model.model.layers.1': 0, 'language_model.model.layers.2': 0, 'language_model.model.layers.3': 0, 'language_model.model.layers.4': 0, 'language_model.model.layers.5': 0, 'language_model.model.layers.6': 0, 'language_model.model.layers.7': 0, 'language_model.model.layers.8': 0, 'language_model.model.layers.9': 0, 'language_model.model.layers.10': 0, 'language_model.model.layers.11': 0, 'language_model.model.layers.12': 0, 'language_model.model.layers.13': 0, 'language_model.model.layers.14': 0, 'language_model.model.layers.15': 0, 'language_model.model.layers.16': 1, 'language_model.model.layers.17': 1, 'language_model.model.layers.18': 1, 'language_model.model.layers.19': 1, 'language_model.model.layers.20': 1, 'language_model.model.layers.21': 1, 'language_model.model.layers.22': 1, 'language_model.model.layers.23': 1, 'language_model.model.layers.24': 1, 'language_model.model.layers.25': 1, 'language_model.model.layers.26': 1, 'language_model.model.layers.27': 1, 'language_model.model.layers.28': 1, 'language_model.model.layers.29': 1, 'language_model.model.layers.30': 1, 'language_model.model.layers.31': 1, 'language_model.model.layers.32': 1, 'language_model.model.layers.33': 1, 'language_model.model.layers.34': 1, 'language_model.model.layers.35': 1, 'language_model.model.layers.36': 1, 'language_model.model.layers.37': 1, 'language_model.model.layers.38': 1, 'language_model.model.layers.39': 1, 'language_model.model.layers.40': 1, 'language_model.model.layers.41': 1, 'language_model.model.layers.42': 1, 'language_model.model.layers.43': 1, 'language_model.model.layers.44': 1, 'language_model.model.layers.45': 1, 'language_model.model.layers.46': 1, 'language_model.model.layers.47': 1} Loading checkpoint shards: 100%|█████████████████████| 11/11 [00:15<00:00, 1.43s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. dynamic ViT batch size: 7 Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/leishanglin/InternVL/internvl_chat/multiGPU_cli.py", line 136, in <module> response = model.chat(tokenizer, pixel_values, question, generation_config) File "/home/hadoop-aipnlp/.cache/huggingface/modules/transformers_modules/modeling_internvl_chat.py", line 304, in chat generation_output = self.generate( File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-aipnlp/sunxiaofei10/anaconda3/envs/internvl_a100/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/hadoop-aipnlp/.cache/huggingface/modules/transformers_modules/modeling_internvl_chat.py", line 354, in generate outputs = self.language_model.generate( File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-aipnlp/sunxiaofei10/anaconda3/envs/internvl_a100/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-aipnlp/sunxiaofei10/anaconda3/envs/internvl_a100/lib/python3.9/site-packages/transformers/generation/utils.py", line 1479, in generate return self.greedy_search( File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-aipnlp/sunxiaofei10/anaconda3/envs/internvl_a100/lib/python3.9/site-packages/transformers/generation/utils.py", line 2380, in greedy_search next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!