Closed annan-tang closed 1 month ago
Hi @annan-tang,
Thank you for PR, I'll take a look tomorrow. Could you please update it to the latest master?
Hi @annan-tang,
Thank you for PR, I'll take a look tomorrow. Could you please update it to the latest master?
Thank you very much, I will update it later. And I'm doing experiments to show the effect. I will report more results later(within several days)
Hi,
I conducted a comparison with and without the central value network initial parameters alignment code on a 2-GPU setting. I used the default Trifinger example in IsaacGymEnvs with the following command:
torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py multi_gpu=True task=Trifinger headless=True seed={xxx}
For each situation, I tested five groups of random seeds ({xxx}) and found that there is not much difference with and without the initial parameters alignment. The reward curves are illustrated below:
Based on these results, it appears that the initial parameters alignment has little effect on the 2-GPU setting. However, I'm not sure if this would change when scaling up to dozens of GPUs.
merging it.
Solution for Potential Issues with Multi-GPU/Node Training with Central Network Weights Initialization #296