DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

inferece如何使用多张V100代替一张A100? #107

Open flying2023 opened 1 year ago

flying2023 commented 1 year ago

你好,如果只有V100机器,加载llama13B版本,会OOM,但有多张V100,如何实现类似automap的功能,将模型映射到多张v100GPU上?

xmy0916 commented 1 year ago

LlamaForCausalLM.from_pretrained的参数device_map改成auto就行了 可能初始化的方式要小改一些

flying2023 commented 1 year ago

if ckpt_path: print("Load first Checkpoint: {}".format(ckpt_path)) ckpt = torch.load(ckpt_path, map_location="cpu") msg = model.load_state_dict(ckpt['model'], strict=False) ckpt_path_2 = cfg.get("ckpt_2", "")
if ckpt_path_2: print("Load second Checkpoint: {}".format(ckpt_path_2)) ckpt = torch.load(ckpt_path_2, map_location="cpu") msg = model.load_state_dict(ckpt['model'], strict=False)

LlamaForCausalLM.from_pretrained的参数device_map改成auto后,上边load_state_dict过程,依旧load到一张卡导致OOM吧?

james-hu commented 9 months ago

有人这做成功了吗?24GB的GPU都跑不起来。

james-hu commented 9 months ago

我最后用的low_resource: True。这样可以在24GB单GPU上跑13b