Closed galnov closed 5 years ago
-crd flag is not working for multi-node. Need to implement a mechanism to sync the checkpoint from the orchestrator into the rollout_workers and trainer for restore.
-crd flag is not working for multi-node. Need to implement a mechanism to sync the checkpoint from the orchestrator into the rollout_workers and trainer for restore.