foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
22 stars 41 forks source link

bug: Boolean values are represented as strings in default fsdp config translates to True #80

Open kmehant opened 6 months ago

kmehant commented 6 months ago

Describe the bug

Boolean values in fsdp config (https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/fsdp_config.json#L4-L6) are represented as string values. This does not translate to actual boolean values when loaded in hf/transformers library https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L1654

Solution

Use json boolean values instead of string representation. Meanwhile, I have as well raised an issue for discussion on transformers library to add a dataclass and argument parser with a motivating example (https://github.com/huggingface/transformers/issues/29476)

fabianlim commented 6 months ago

@kmehant while the fix is as you described, now that #53 is merged, I think it may be best to switch to accelerate, which uses a yaml config defaults file. With yaml explicit encasement of strings using " and ' are not necessary, and it will be more robust to such issues. I suggest the following changes

  1. update the README.md removing instructions for torch.run and replace with accelerate.launch
  2. replace the FSDP JSON with a config yaml like this.
    • BTW I suggest to move fsdp_config.json out of tuning/config (which houses code) into somewhere which only houses config fixtures.

@Ssukriti

Ssukriti commented 6 months ago

Thanks Fabian, created issue for README updates https://github.com/foundation-model-stack/fms-hf-tuning/issues/87 . We will prioritize it at earliest

kmehant commented 6 months ago

https://github.com/foundation-model-stack/fms-hf-tuning/issues/80#issuecomment-1989734269

I think it may be best to switch to accelerate, which uses a yaml config defaults file.

Thanks @fabianlim, I am aware of this, isn't accelerate a wrapper over torch.distributed?

I suggest the following changes

I guess @Ssukriti is tracking them in a different issue https://github.com/foundation-model-stack/fms-hf-tuning/issues/87

Ssukriti commented 6 months ago

@kmehant I was planning to get to issue #87 in next 2 days as its high priority for our deliverables, but if you are interested and want to contribute instead, feel free to do so. Just let me know so I can plan accordingly :) . We do need it completed at earliest so we can also start some testing with multi-GPU on our end as well

kmehant commented 6 months ago

https://github.com/foundation-model-stack/fms-hf-tuning/issues/80#issuecomment-1993604407

@Ssukriti I will be glad to raise a PR in a couple of hours.

fabianlim commented 6 months ago

@kmehant its up to you but I should be able to get to #87 pretty soon.

kmehant commented 6 months ago

@Ssukriti @fabianlim I have raised a PR here https://github.com/foundation-model-stack/fms-hf-tuning/pull/92 Thanks.

fabianlim commented 6 months ago

@Ssukriti @fabianlim I have raised a PR here https://github.com/foundation-model-stack/fms-hf-tuning/pull/92 Thanks.

@kmehant ok looks like we duplicated work, see #91