foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
28 stars 48 forks source link

fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check #352

Closed Abhishek-TAMU closed 2 months ago

Abhishek-TAMU commented 2 months ago

Description of the change

Addition of code to create output_dir in accelerate_launch.py if it doesn't exist.

Related issue number

#1352

How to verify the PR

Run full fine tuning or LoRA tuning with multiple GPUs and check if issue exists.

Was the PR tested

github-actions[bot] commented 2 months ago

Thanks for making a pull request! 😃 One of the maintainers will review and advise on the next steps.

github-actions[bot] commented 2 months ago

Thanks for making a pull request! 😃 One of the maintainers will review and advise on the next steps.