Requesting example to use PyTorch FSDP

determined-ai / determined-examples

Example ML projects that use the Determined library.

Apache License 2.0

14 stars 1 forks source link

hello, we haven't added it here yet, but there's an unofficial example here: https://github.com/garrett361/determined/tree/scratchwork/scratchwork/fsdp_min

For the context, PytorchTrial does not support FSDP and there're no plans to add that. For FSDP, you should use Core API instead, and it'll be effectively the same as the torch DDP: standard torch distributed launcher works the same, metrics logging and hpsearch work the same. if you checkpoint full model from rank=0, it'll work the same as well. if you want to do sharded checkpointing, use the sharded checkpointing shard=True option.

determined-ai / determined-examples

Requesting example to use PyTorch FSDP #19