foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
9 stars 30 forks source link

feat: Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182

Open seshapad opened 2 weeks ago

seshapad commented 2 weeks ago

Is your feature request related to a problem? Please describe.

Controlling the trainer in per-process level in terms of metric computation and operation execution should be enabled.

Describe the solution you'd like

Trainer controller capabilities in terms of process-specific rules is required.

Describe alternatives you've considered

NA

Additional context

NA