Open roggirg opened 2 years ago
@eugenevinitsky Could you help take a look?
Hi, sorry this bug is here! I am out today but this will be definitively fixed by tomorrow afternoon.
I believe the fixes that you have there are correct though.
Thanks for your patience, working on getting this merged but the relevant fixes are in: https://github.com/facebookresearch/nocturne/pull/39
Heads up though, that code has not been extensively hyper-parameter tuned
No rush at all but let us know if this resolves your issue?
Hi @eugenevinitsky ,
Everything is running now thanks for the fixes.
Just out of curiosity before we close this issue, what should the fps be during training? I'm getting 25-30:
average episode rewards is 0.33026985824108124
maximum per step reward is 0.058307357132434845
Algo rmappo Exp intersection updates 50/1250000 episodes, total num timesteps 4080/100000000.0, FPS 29.
average episode rewards is 2.849382162094116
maximum per step reward is 8.059619903564453
episode reward of rendered episode is: 0.8622641801569368
Algo rmappo Exp intersection updates 55/1250000 episodes, total num timesteps 4480/100000000.0, FPS 25.
average episode rewards is 0.9344396740198135
maximum per step reward is 0.05804213136434555
Algo rmappo Exp intersection updates 60/1250000 episodes, total num timesteps 4880/100000000.0, FPS 26.
average episode rewards is 1.3483695685863495
maximum per step reward is 8.056236267089844
Algo rmappo Exp intersection updates 65/1250000 episodes, total num timesteps 5280/100000000.0, FPS 27.
average episode rewards is 1.1445978283882141
maximum per step reward is 0.057421959936618805
Thanks!
Hi @eugenevinitsky ,
Everything is running now thanks for the fixes.
Just out of curiosity before we close this issue, what should the fps be during training? I'm getting 25-30:
average episode rewards is 0.33026985824108124 maximum per step reward is 0.058307357132434845 Algo rmappo Exp intersection updates 50/1250000 episodes, total num timesteps 4080/100000000.0, FPS 29. average episode rewards is 2.849382162094116 maximum per step reward is 8.059619903564453 episode reward of rendered episode is: 0.8622641801569368 Algo rmappo Exp intersection updates 55/1250000 episodes, total num timesteps 4480/100000000.0, FPS 25. average episode rewards is 0.9344396740198135 maximum per step reward is 0.05804213136434555 Algo rmappo Exp intersection updates 60/1250000 episodes, total num timesteps 4880/100000000.0, FPS 26. average episode rewards is 1.3483695685863495 maximum per step reward is 8.056236267089844 Algo rmappo Exp intersection updates 65/1250000 episodes, total num timesteps 5280/100000000.0, FPS 27. average episode rewards is 1.1445978283882141 maximum per step reward is 0.057421959936618805
Thanks!
It's hard to say what is the normal FPS. It depends on lost of things. Could you provide more details such as what machine you are using, what and how many CPU cores you have, what and how many GPUs you have, etc.
Hey @roggirg, it depends on the number of rollout threads you're using and whether you are using a GPU or just CPU; the MAPPO code uses an RNN by default and includes the time for backprop when computing the FPS. Can you try increasing the value of algorithm.n_rollout_threads? It should basically scale linearly in the number of threads or workers
Ah cool, thanks @eugenevinitsky @xiaomengy . I played around with n_rollout_threads=4 (did not know of its existence) and the FPS jumped up to ~50ish. FYI, I'm running on a 1080Ti with a 12 -core CPU. Thanks for your help.
We're going to re-open this because that's a good deal slower than we expect it to be. @xiaomengy, any chance you could run the line
python examples/on_policy_files/nocturne_runner.py algorithm=ppo algorithm.n_rollout_threads=4
and report the FPS? I don't have GPU access for a little while so I can't check it myself.
Hi Folks,
I'm trying to run "on-policy PPO" using
python examples/on_policy_files/nocturne_runner.py algorithm=ppo
and there are a couple of issues I'm encountering.algo
vs.algorithm
: The config.yml file usesalgorithm
whereas the script usescfg.algo
. Switchingalgo
toalgorithm
seems to fix the issue.wandb_name
seems to be missing from the cfg. To make it work, I just disabled use of wandb.len(self.vehicles)
on line 30 which throwsAttributeError: 'BaseEnv' object has no attribute 'vehicles'
. Replacingself.vehicles
withself.controlled_vehicles
seems to solve the issue. Is this the correct way to fix it?Thanks for your help.