Closed ZiyiLiubird closed 5 months ago
Hi, I think this is an issue with safetensors and us using an older version of accelerate due to LR scheduler bugs. Try setting safe_serialization
to False
in save_with_accelerate
like such:
def save_with_accelerate(accelerator, model, tokenizer, output_dir, args):
unwrapped_model = accelerator.unwrap_model(model)
# When doing multi-gpu training, we need to use accelerator.get_state_dict(model) to get the state_dict.
# Otherwise, sometimes the model will be saved with only part of the parameters.
# Also, accelerator needs to use the wrapped model to get the state_dict.
state_dict = accelerator.get_state_dict(model)
if args.use_lora:
# When using lora, the unwrapped model is a PeftModel, which doesn't support the is_main_process
# and has its own save_pretrained function for only saving lora modules.
# We have to manually specify the is_main_process outside the save_pretrained function.
if accelerator.is_main_process:
unwrapped_model.save_pretrained(output_dir, state_dict=state_dict)
else:
# don't use safetensors for saving for now
unwrapped_model.save_pretrained(
output_dir, is_main_process=accelerator.is_main_process, save_function=accelerator.save, state_dict=state_dict,
safe_serialization=False
)
We will make this fix in the code shortly!
Dear authors, thank you for opening source this great project. After I finetue my llama2-7B model using the finetune_with_accelate.sh, I can not load model weights by vllm or huggingface during inference process.
It seems that the model is saved successfully:
But when I use vllm to load model weights, there is an error:
I will be greatly thankful if you can give me some insight for how to deal with this issus.