Closed CarryChang closed 4 months ago
HuggingFace代码中accelerate
库对模型的显存分配计算有问题,目前示例代码已修改,预计大幅缩短模型加载速度。
加载模型的代码修改为:
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
建议使用vllm启动https://github.com/vllm-project/vllm/pull/4650