foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
28 stars 48 forks source link

fix: crash when output directory doesn't exist #364

Closed HarikrishnanBalagopal closed 1 month ago

HarikrishnanBalagopal commented 1 month ago

Description of the change

Fixes crash in https://github.com/foundation-model-stack/fms-hf-tuning/issues/359

Related issue number

https://github.com/foundation-model-stack/fms-hf-tuning/issues/359

How to verify the PR

Run a multi GPU training with a non-existent output dir.

Was the PR tested

github-actions[bot] commented 1 month ago

Thanks for making a pull request! 😃 One of the maintainers will review and advise on the next steps.

kmehant commented 1 month ago

FYA - @anhuong https://github.com/foundation-model-stack/fms-hf-tuning/issues/359#issue-2558689388

Abhishek-TAMU commented 1 month ago

If race condition is already tested as per this, then it looks good to me.

anhuong commented 1 month ago

Abhishek additionally tested this in the image with the accelerate_launch.py script which worked nicely as well