Open kmehant opened 6 months ago
@kmehant while the fix is as you described, now that #53 is merged, I think it may be best to switch to accelerate, which uses a yaml config defaults file. With yaml explicit encasement of strings using "
and '
are not necessary, and it will be more robust to such issues. I suggest the following changes
torch.run
and replace with accelerate.launch
fsdp_config.json
out of tuning/config
(which houses code) into somewhere which only houses config fixtures.@Ssukriti
Thanks Fabian, created issue for README updates https://github.com/foundation-model-stack/fms-hf-tuning/issues/87 . We will prioritize it at earliest
https://github.com/foundation-model-stack/fms-hf-tuning/issues/80#issuecomment-1989734269
I think it may be best to switch to accelerate, which uses a yaml config defaults file.
Thanks @fabianlim, I am aware of this, isn't accelerate a wrapper over torch.distributed?
I suggest the following changes
I guess @Ssukriti is tracking them in a different issue https://github.com/foundation-model-stack/fms-hf-tuning/issues/87
@kmehant I was planning to get to issue #87 in next 2 days as its high priority for our deliverables, but if you are interested and want to contribute instead, feel free to do so. Just let me know so I can plan accordingly :) . We do need it completed at earliest so we can also start some testing with multi-GPU on our end as well
https://github.com/foundation-model-stack/fms-hf-tuning/issues/80#issuecomment-1993604407
@Ssukriti I will be glad to raise a PR in a couple of hours.
@kmehant its up to you but I should be able to get to #87 pretty soon.
@Ssukriti @fabianlim I have raised a PR here https://github.com/foundation-model-stack/fms-hf-tuning/pull/92 Thanks.
@Ssukriti @fabianlim I have raised a PR here https://github.com/foundation-model-stack/fms-hf-tuning/pull/92 Thanks.
@kmehant ok looks like we duplicated work, see #91
Describe the bug
Boolean values in fsdp config (https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/config/fsdp_config.json#L4-L6) are represented as string values. This does not translate to actual boolean values when loaded in hf/transformers library https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L1654
Solution
Use json boolean values instead of string representation. Meanwhile, I have as well raised an issue for discussion on transformers library to add a dataclass and argument parser with a motivating example (https://github.com/huggingface/transformers/issues/29476)