多机推理float16 - Githubissues

chaochen1998 commented 4 months ago

您好，请问会发布多机推理‘internlm-xcomposer2-vl-7b’模型的代码吗？

chaochen1998 commented 4 months ago

我想使用float32的模型，但是我一张卡只有24G的显存，用两张24G的显卡，又不知道怎么设置device_map，我目前的设置是这样：

{
    "model.tok_embeddings": 0,
    "model.layers.0": 0,
    "model.layers.1": 0,
    "model.layers.2": 0,
    "model.layers.3": 0,
    "model.layers.4": 0,
    "model.layers.5": 0,
    "model.layers.6": 0,
    "model.layers.7": 0,
    "model.layers.8": 0,
    "model.layers.9": 0,
    "model.layers.10": 0,
    "model.layers.11": 0,
    "model.layers.12": 0,
    "model.layers.13": 0,
    "model.layers.14": 0,
    "model.layers.15": 0,
    "model.layers.16": 1,
    "model.layers.17": 1,
    "model.layers.18": 1,
    "model.layers.19": 1,
    "model.layers.20": 1,
    "model.layers.21": 1,
    "model.layers.22": 1,
    "model.layers.23": 1,
    "model.layers.24": 1,
    "model.layers.25": 1,
    "model.layers.26": 1,
    "model.layers.27": 1,
    "model.layers.28": 1,
    "model.layers.29": 1,
    "model.layers.30": 1,
    "model.layers.31": 1,
    "model.norm": 1,
    "output": 1,
    "vit": 1,
    "vision_proj": 1
}

提示的报错信息为：

Traceback (most recent call last):
  File "/mnt/zeron-vepfs/panqu.wang/workspace/InternLM-XComposer/examples/Inference.py", line 35, in <module>
    response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
  File "/opt/conda/envs/intern_clean/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 501, in chat
    inputs, im_mask = self.interleav_wrap_chat(tokenizer, query, image, history, meta_instruction)
  File "/root/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 199, in interleav_wrap_chat
    wrap_embeds = torch.cat(wrap_embeds, dim=1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

期待您的解答！

rexainn commented 4 months ago

请问您解决了吗，该如何多机推理呢？另外，我使用一张40G的卡推理internlm-xcomposer2-vl-7b也失败了。

rexainn commented 4 months ago

@myownskyW7 @yhcao6

Asianfleet commented 4 months ago

@rexainn hours ago 我在我的A100-40G上遇到了OOM问题。我的解决办法是，将示例中的 model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).cuda().eval() 改为 model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-7b', device_map='cuda', trust_remote_code=True).half().eval() 即可运行

panzhang0212 commented 4 months ago

@SANJINGSHOU14 "vit" and "vision_proj" should set to device 0, so that the features from "vit" and "model.tok_embeddings" could be concatenated.

We have support multi-GPUs test, please refer to https://github.com/InternLM/InternLM-XComposer?tab=readme-ov-file#inference-on-multiple-gpus

Feel free to reopen this issue if you have any problems.

InternLM / InternLM-XComposer

多机推理float16 #189