Denys88 / rl_games

RL implementations
MIT License
941 stars 155 forks source link

self.minibatch_size #232

Open 1tac11 opened 1 year ago

1tac11 commented 1 year ago

Hi there,

in a2c_common.py line 194: self.minibatch_size = self.config.get('minibatch_size', self.num_actors * self.minibatch_size_per_env) shouldn't it be self.minibatch_size = self.config.get('minibatch_size', self.num_envs * self.minibatch_size_per_env) instead?

1tac11 commented 1 year ago

mainly for clarification please

1tac11 commented 1 year ago

ah ok, I see, num_actors = num_envs sorry to bother again, so how is self.seq_len connected to horizon_length ? and shouldn't self.minibatch_size_per_env = self.config.get('minibatch_size_per_env', 0) be self.minibatch_size_per_env = self.config.get('minibatch_size_per_env', self.minibatch_size // self.num_actors) instead (also in a2c_common.py)?

1tac11 commented 1 year ago

One more question please: What Does the parallel calculation with torchrun? The problem was that when i let Ant run on 4 machines in parallel then it does not calculate four times as fast but only twice as fast. As I understand in forward samples are created on every gpu while backward the batches are computed in parallel, right? Then there should almost be no overhead in parallelization.

ViktorM commented 1 year ago

Hi @1seck! horizon_length should be divisible by self.seq_len, so the maximum value it can take equals the horizon_length but can be a fraction of it.

As about self.minibatch_size_per_env it's not used anywhere except self.minibatch_size calculation when it's not set. With the default value 0 we could have some additional checks, in theory, not currently used.

ViktorM commented 1 year ago
What Does the parallel calculation with torchrun?
The problem was that when i let Ant run on 4 machines in parallel then it does not calculate four times as fast but only twice as fast.

What metrics are you talking about? FPS step and step_and_inference should scale almost linearly with a number of GPUs. Total FPS scaling won't be linear as additionally gradients are moved between different GPUs.

ViktorM commented 1 year ago

And what are the numbers you got?

1tac11 commented 1 year ago

Hi viktorM, Thank you for responding. It seems fine as long as I am on one machine with multiple gpus, but when trying different machines with the master_addr and port args, the weights are not shared and the worker nodes have same best rewards output at step n as on single machine training . I am comparing the best reward at a certain step n. Even if I have four gpus on one machine it seems like the best reward score is only twice as fast. I will check again tomorrow to double-check but that’s how the training went last week. Kind regards

1tac11 commented 1 year ago

8 GPUs: epoch 200:5900, epoch 500: 8400 1 GPU: epoch 200: 4100, epoch 500: 6637 Regards

1tac11 commented 1 year ago

I mean, I don’t know whether it syncs at all when distributing over several instances.