facebookresearch / d2go

D2Go is a toolkit for efficient deep learning
Apache License 2.0
826 stars 197 forks source link

Add proper barriers around FSDP checkpointing #621

Closed seungkyoon closed 9 months ago

seungkyoon commented 9 months ago

Summary: There should be barriers around FSDP checkpointing to ensure other ranks do not continue to training while rank 0 is still checkpointing

Also add log after checkpoint finishes

Differential Revision: D49541229

facebook-github-bot commented 9 months ago

This pull request was exported from Phabricator. Differential Revision: D49541229

facebook-github-bot commented 9 months ago

This pull request was exported from Phabricator. Differential Revision: D49541229

facebook-github-bot commented 9 months ago

This pull request was exported from Phabricator. Differential Revision: D49541229

facebook-github-bot commented 9 months ago

This pull request has been merged in facebookresearch/d2go@279185539d40cb4847d7094c12d871f98147c9c0.