Open Muennighoff opened 2 months ago
We should probably use DDP instead FSDP + NO_SHARD as FSDP + NO_SHARD will be deprecated & there's issues like this: https://github.com/pytorch/pytorch/issues/88621
No response
🚀 The feature, motivation and pitch
We should probably use DDP instead FSDP + NO_SHARD as FSDP + NO_SHARD will be deprecated & there's issues like this: https://github.com/pytorch/pytorch/issues/88621
Alternatives
No response
Additional context
No response