RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
因为想要多卡部署(实验室的单卡装不下7B的模型),但是并没有在项目的说明中看到有支持多卡加载,于是我尝试在加载模型的方法
from_pretrained()
中操作。 我在toolbench/inference/LLM/tool_llama_model.py
(加载模型的代码)中加载模型的这一行self.model = AutoModelForCausalLM.from_pretrained(model_name_or_path, low_cpu_mem_usage=True)
传递的参数中加入device_map='auto'
。但是这样会报错:请问这是什么情况呢,或者大家有没有多卡加载办法呢?