Denys88 / rl_games

RL implementations
MIT License
881 stars 147 forks source link

Ensure the consistency of central_value_net with the same initial parameters before training starts in a multi-GPU setting. #297

Closed annan-tang closed 1 month ago

annan-tang commented 3 months ago

Solution for Potential Issues with Multi-GPU/Node Training with Central Network Weights Initialization #296

ViktorM commented 3 months ago

Hi @annan-tang,

Thank you for PR, I'll take a look tomorrow. Could you please update it to the latest master?

annan-tang commented 3 months ago

Hi @annan-tang,

Thank you for PR, I'll take a look tomorrow. Could you please update it to the latest master?

Thank you very much, I will update it later. And I'm doing experiments to show the effect. I will report more results later(within several days)

annan-tang commented 3 months ago

Hi,

I conducted a comparison with and without the central value network initial parameters alignment code on a 2-GPU setting. I used the default Trifinger example in IsaacGymEnvs with the following command:

torchrun --standalone --nnodes=1 --nproc_per_node=2 train.py multi_gpu=True task=Trifinger headless=True seed={xxx}

For each situation, I tested five groups of random seeds ({xxx}) and found that there is not much difference with and without the initial parameters alignment. The reward curves are illustrated below:

central_value_net_alignment

Based on these results, it appears that the initial parameters alignment has little effect on the 2-GPU setting. However, I'm not sure if this would change when scaling up to dozens of GPUs.

Denys88 commented 1 month ago

merging it.