Closed vwxyzjn closed 2 years ago
@vwxyzjn Ill take a look. Strange but kl divergence syncing was needed to set right LR. Or you mean it is enough to calc it on the rank=0 gpu?
But overall looks awesome and much easier to use :) it takes a few hours of pain to install horovod for average person :)
Thank you @Denys88
Strange but kl divergence syncing was needed to set right LR. Or you mean it is enough to calc it on the rank=0 gpu?
Yes, it should be enough to call it on rank=0 GPU, al least in the case of isaacgymenvs when thousands of envs are available
Follow up with #165 #158
We did a benchmark with isaacgymenvs and
torch.distributed
shows consistently better scaling performance inAllegroHand
.Notable change - I disable the stats syncing, which does not seem to be that important to average across all workers at every step.
You can test it out with
CC @ViktorM @markelsanz14