Closed vwxyzjn closed 1 year ago
Hi vwxyzjn,
By default the Impala agent is rate limited so that (in expectation) every sampled experience is trained on. This is what the samples_per_insert = 1.0
parameter does. I'm wondering if this is what is limiting you in this case. Can you try reducing it to, e.g., 0.25 to see if it has any effect on your training speed?
Thanks for your interest and question, happy Acming!
Bobak
Hi @bshahr, thanks for the suggestion. I gave it a try, but it did not make a difference. My collaborator suggested maybe this was because the actors only run in CPUs, and therefore the action sampling through resnet could be slow. Do you think this may be the case?
On a related note, were the IMPALA results shown in the new paper generated by run_impala.py
? If not, how could I reproduce it?
Btw I really liked how the preliminary and background section are written in the new paper :)
Hi vwxyzjn,
Thanks for the kind words re: the papers!
As for your Impala question, our results were indeed run using the run_impala.py
script but in a distributed way, so with the --run_distributed
flag. If the number of actors is too high, you could potentially reduce it without impacting speed too much.
Hope this helps reproduce our results!
Bobak
Hello, I run into the same problem, even with the --run_distributed
flag (which is enabled by default in the latest version), I have something like 120 sps, did you find any solution for this ?
Thank you.
Hey @Elameri I did try with --run_distributed
but encountered low SPS as well.
@bshahr thank you for your reply. Quick follow-up question — what are the hardware resources used to run the IMPALA experiments? Maybe IMPALA will produce different results depending on the machine configurations.
Hi vwxyzjn and Elameri,
So the --run_distributed
will distribute the computation but that'll only help if you have a corresponding amount of compute. Roughly speaking, you should be able to get to 50M actor steps (200M frames) in 2 days if you have 256 actors running on 60 dedicated CPUs and a learner with a dedicated modern GPU (e.g. V100). (If you're running actors and the learner on the same machine, make sure the actors are not using the GPU.)
As for producing different results on different hardware capabilities, we've tried very hard to minimize this. This is why we use Reverb's rate limitation feature in our agents (see the paper for a more detailed discussion). This ensures that you get similar results no matter the relative speeds of your learner/actors.
Hope this helps! I'll close the issue but feel free to reopen.
Bobak
Hi @bshahr, happy new year! Thanks for your response.
I put up a table below comparing ACME's results with the original IMPALA paper. The original IMPALA paper reported finishing 200M frames in under an hour (the shallow model that uses Nature DQN net) and reached a similar level of results as reported in ACME papers. Would you mind looking into the runtime difference?
env | Espeholt et al., 2018 (IMPALA shallow model, "Note that the shallow IMPALA experiment completes training over 200 million frames in less than one hour") | Espeholt et al., 2018 (IMPALA deep model, runtime unspecified, maybe a bit more than 1 hour?) | ACME (20 Sep 2022 arxiv version) | ACME ( Jun 2020 arxiv version) |
---|---|---|---|---|
Asterix | 29692.50 | 300732.00 | ~200000 ± 50000 | - |
Breakout | 640.43 | 787.34 | ~450 ± 50 | 700 (12 hours) |
Mspacman | 6501.71 | 7342.32 | 2000 ± 300 | 5000 (12 hours) |
SpaceInvaders | 1726.28 | 43595.78 | 12000 ± 12000 | 20000 (12 hours) |
Hello, I was trying to run the IMPALA example with Atari on my personal machine w/ two GPUs and a CPU with 24 cores, but the Steps per Second (SPS) looked suspiciously low to me (around 100 SPS) I am probably missing something obvious and would appreciate your help.
Reproduction
I cloned the latest repo and ran
The output is as follows:
Neofetch
Thank you!