🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
9
stars
30
forks
source link
feat: Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182
Open
seshapad opened 2 weeks ago
Is your feature request related to a problem? Please describe.
Controlling the trainer in per-process level in terms of metric computation and operation execution should be enabled.
Describe the solution you'd like
Trainer controller capabilities in terms of process-specific rules is required.
Describe alternatives you've considered
NA
Additional context
NA