Closed YetAnotherPolicy closed 2 years ago
Hello,
Estimating training time is very difficult, since it entirely depends on the training stack, available compute, etc. There is typically a fundamental tradeoff between wall-clock time and compute. From our side, we have tried two very different training stacks, and one of them trained populations in a bit under a week, and in another stack it took just one day. The number of workers was also quite different in the two stacks.
We recognise that compute is likely a limiting factor in training these population which is why we are actively working on improving the performance of the substrates, including reducing the time spent in Python, trying instead to delegate to the underlying C++ implementation of the substrate engine (Lab2D) as soon as possible.
Hope this helps
Dear @duenez, thanks for the detailed and helpful reply. I appreciate your team's efforts to make MeltingPot a great testbed in MARL research.
@YetAnotherPolicy I am curious to know how long you take to train these populations!
In my case, I can train 1e^6 steps in almost exactly an hour using 4 RLlib workers and a 64GB RAM machine with Rtx 3060 Nvidia GPU.
@YetAnotherPolicy Could you please tell me your hardware specs ? I mean, num CPU’s , RAM, GPU ? or do you train in the cloud ?
@YetAnotherPolicy I am curious to know how long you take to train these populations!
In my case, I can train 1e^6 steps in almost exactly an hour using 4 RLlib workers and a 64GB RAM machine with Rtx 3060 Nvidia GPU.
Hi in my case I use 32 workers and it will take 8 minutes to run 1M steps. Note that it depends on the simulation speed.
@YetAnotherPolicy Could you please tell me your hardware specs ? I mean, num CPU’s , RAM, GPU ? or do you train in the cloud ?
I use very common Intel's CPUs, 40 in total. As the states are RGB images. I use A100, which can be faster than 3090. RAM is 256G.
@YetAnotherPolicy Sorry! I am back with the questions!
Which algorithm are you using to train? I have notice that in my case PPO is 8 times slower than A3C. Have you experienced anything similar?
@YetAnotherPolicy Sorry! I am back with the questions!
Which algorithm are you using to train? I have notice that in my case PPO is 8 times slower than A3C. Have you experienced anything similar?
Hi, I use PPO. Note that there is an inner training loop in each update in PPO, see this link: https://github.com/openai/spinningup/blob/master/spinup/algos/pytorch/ppo/ppo.py#L265. Please also check if RLlib uses this trick.
Training with PPO costs 1.5 days for 200M.
Hello @YetAnotherPolicy,
I got confused for your last message, i would like to know if for the training of the workers you used the RLlib library?
Hello @YetAnotherPolicy,
I got confused for your last message, i would like to know if for the training of the workers you used the RLlib library?
Hi, @yesfon, I did not use RLlib.
Hello @YetAnotherPolicy, I got confused for your last message, i would like to know if for the training of the workers you used the RLlib library?
Hi, @yesfon, I did not use RLlib.
May I ask what did you use ?
Hello @YetAnotherPolicy, I got confused for your last message, i would like to know if for the training of the workers you used the RLlib library?
Hi, @yesfon, I did not use RLlib.
May I ask what did you use ?
Hi, I use multiprocessing as well as ray's remote actor to collect data. RLlib is also good, but it takes a lot of time to learn its APIs.
Dear authors,
Thanks for building such ambitious environments for MARL research. In your paper, I found it will take 10^9 steps to run the simulation for each agent. In order to train agents, how many rollout workers did you set and how many hours did you take to get the final results in Table 1: Focal per-capita returns?
Thank you.