请问这个项目如何多卡加载部署呢？（仅做推理）

yaokunkun commented 7 months ago

因为想要多卡部署（实验室的单卡装不下7B的模型），但是并没有在项目的说明中看到有支持多卡加载，于是我尝试在加载模型的方法from_pretrained()中操作。我在toolbench/inference/LLM/tool_llama_model.py（加载模型的代码）中加载模型的这一行 self.model = AutoModelForCausalLM.from_pretrained(model_name_or_path, low_cpu_mem_usage=True) 传递的参数中加入device_map='auto'。但是这样会报错：

RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

请问这是什么情况呢，或者大家有没有多卡加载办法呢？

yaokunkun commented 7 months ago

tmon546596046 commented 7 months ago

4090使用bf16能启动

OpenBMB / ToolBench

请问这个项目如何多卡加载部署呢？（仅做推理） #210