Open 1tac11 opened 1 year ago
mainly for clarification please
ah ok, I see, num_actors = num_envs
sorry to bother again,
so how is self.seq_len connected to horizon_length ?
and shouldn't self.minibatch_size_per_env = self.config.get('minibatch_size_per_env', 0)
be self.minibatch_size_per_env = self.config.get('minibatch_size_per_env', self.minibatch_size // self.num_actors)
instead (also in a2c_common.py)?
One more question please:
What Does the parallel calculation with torchrun
?
The problem was that when i let Ant run on 4 machines in parallel then it does not calculate four times as fast but only twice as fast.
As I understand in forward samples are created on every gpu while backward the batches are computed in parallel, right? Then there should almost be no overhead in parallelization.
Hi @1seck! horizon_length should be divisible by self.seq_len, so the maximum value it can take equals the horizon_length but can be a fraction of it.
As about self.minibatch_size_per_env it's not used anywhere except self.minibatch_size calculation when it's not set. With the default value 0 we could have some additional checks, in theory, not currently used.
What Does the parallel calculation with torchrun?
The problem was that when i let Ant run on 4 machines in parallel then it does not calculate four times as fast but only twice as fast.
What metrics are you talking about? FPS step and step_and_inference should scale almost linearly with a number of GPUs. Total FPS scaling won't be linear as additionally gradients are moved between different GPUs.
And what are the numbers you got?
Hi viktorM,
Thank you for responding. It seems fine as long as I am on one machine with multiple gpus, but when trying different machines with the master_addr
and port args, the weights are not shared and the worker nodes have same best rewards output at step n as on single machine training . I am comparing the best reward at a certain step n.
Even if I have four gpus on one machine it seems like the best reward score is only twice as fast.
I will check again tomorrow to double-check but that’s how the training went last week.
Kind regards
8 GPUs: epoch 200:5900, epoch 500: 8400 1 GPU: epoch 200: 4100, epoch 500: 6637 Regards
I mean, I don’t know whether it syncs at all when distributing over several instances.
Hi there,
in a2c_common.py line 194:
self.minibatch_size = self.config.get('minibatch_size', self.num_actors * self.minibatch_size_per_env)
shouldn't it beself.minibatch_size = self.config.get('minibatch_size', self.num_envs * self.minibatch_size_per_env)
instead?