Open vip-china opened 9 months ago
is the issue that the merge doesn't work? or that specifying save_safetensors produces a Pytorch_model.bin? or both?
specifying save_safetensors produces a Pytorch_model.bin
Using the following command to merge models, there is an error message:
Hey, seems like the PeftModel loading failed. Can you check the files in lora_model_dir
are valid (aren't a few KBs)?
Did you run out of space during training?
Since the training had a few checkpoints, could you try pointing the model_dir to one of those and see what happens?
The meaning of this parameter is not effective save_safetensors: true
I'm a bit confused at this as the you said the merge failed.
Now I am continuing SFT training from checkpoint and reporting this error again
I have configured this parameter: use_reentrant: true resume_from_checkpoint: /workspace/axolotl-main/checkpoint-5865
resuming from a "peft checkpoint" is not the same as resuming from a regular checkpoint. You'll want to set lora_model_dir
to point to the checkpoint directory iirc. @NanoCode012 does that sound right?
Please check that this issue hasn't been reported before.
Expected Behavior
Generate the correct Lora after training is completed
Current behaviour
Using the following command to merge models, there is an error message:
python3 -m axolotl.cli.merge_lora sft_34b.yml --lora_model_dir="/workspace/axolotl/output/Yi-34B/ljf-yi-34b-lora" --output_dir=/data1/ljf2/data-check-test
Steps to reproduce
The meaning of this parameter is not effective save_safetensors: true Actually generated adapter_model.bin
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main
Acknowledgements