Issues Running MAPPO - Githubissues

roggirg commented 2 years ago

Hi Folks,

I'm trying to run "on-policy PPO" using python examples/on_policy_files/nocturne_runner.py algorithm=ppo and there are a couple of issues I'm encountering.

algo vs. algorithm: The config.yml file uses algorithm whereas the script uses cfg.algo. Switching algo to algorithm seems to fix the issue.
wandb_name seems to be missing from the cfg. To make it work, I just disabled use of wandb.
The wrapper environment calls len(self.vehicles) on line 30 which throws AttributeError: 'BaseEnv' object has no attribute 'vehicles'. Replacing self.vehicles with self.controlled_vehicles seems to solve the issue. Is this the correct way to fix it?

Thanks for your help.

xiaomengy commented 2 years ago

@eugenevinitsky Could you help take a look?

eugenevinitsky commented 2 years ago

Hi, sorry this bug is here! I am out today but this will be definitively fixed by tomorrow afternoon.

eugenevinitsky commented 2 years ago

I believe the fixes that you have there are correct though.

eugenevinitsky commented 2 years ago

Thanks for your patience, working on getting this merged but the relevant fixes are in: https://github.com/facebookresearch/nocturne/pull/39

eugenevinitsky commented 2 years ago

Heads up though, that code has not been extensively hyper-parameter tuned

eugenevinitsky commented 2 years ago

No rush at all but let us know if this resolves your issue?

roggirg commented 2 years ago

Hi @eugenevinitsky ,

Everything is running now thanks for the fixes.

Just out of curiosity before we close this issue, what should the fps be during training? I'm getting 25-30:

average episode rewards is 0.33026985824108124
maximum per step reward is 0.058307357132434845

 Algo rmappo Exp intersection updates 50/1250000 episodes, total num timesteps 4080/100000000.0, FPS 29.

average episode rewards is 2.849382162094116
maximum per step reward is 8.059619903564453
episode reward of rendered episode is: 0.8622641801569368

 Algo rmappo Exp intersection updates 55/1250000 episodes, total num timesteps 4480/100000000.0, FPS 25.

average episode rewards is 0.9344396740198135
maximum per step reward is 0.05804213136434555

 Algo rmappo Exp intersection updates 60/1250000 episodes, total num timesteps 4880/100000000.0, FPS 26.

average episode rewards is 1.3483695685863495
maximum per step reward is 8.056236267089844

 Algo rmappo Exp intersection updates 65/1250000 episodes, total num timesteps 5280/100000000.0, FPS 27.

average episode rewards is 1.1445978283882141
maximum per step reward is 0.057421959936618805

Thanks!

xiaomengy commented 2 years ago

Hi @eugenevinitsky ,

Everything is running now thanks for the fixes.

Just out of curiosity before we close this issue, what should the fps be during training? I'm getting 25-30:

average episode rewards is 0.33026985824108124
maximum per step reward is 0.058307357132434845

 Algo rmappo Exp intersection updates 50/1250000 episodes, total num timesteps 4080/100000000.0, FPS 29.

average episode rewards is 2.849382162094116
maximum per step reward is 8.059619903564453
episode reward of rendered episode is: 0.8622641801569368

 Algo rmappo Exp intersection updates 55/1250000 episodes, total num timesteps 4480/100000000.0, FPS 25.

average episode rewards is 0.9344396740198135
maximum per step reward is 0.05804213136434555

 Algo rmappo Exp intersection updates 60/1250000 episodes, total num timesteps 4880/100000000.0, FPS 26.

average episode rewards is 1.3483695685863495
maximum per step reward is 8.056236267089844

 Algo rmappo Exp intersection updates 65/1250000 episodes, total num timesteps 5280/100000000.0, FPS 27.

average episode rewards is 1.1445978283882141
maximum per step reward is 0.057421959936618805

Thanks!

It's hard to say what is the normal FPS. It depends on lost of things. Could you provide more details such as what machine you are using, what and how many CPU cores you have, what and how many GPUs you have, etc.

eugenevinitsky commented 2 years ago

Hey @roggirg, it depends on the number of rollout threads you're using and whether you are using a GPU or just CPU; the MAPPO code uses an RNN by default and includes the time for backprop when computing the FPS. Can you try increasing the value of algorithm.n_rollout_threads? It should basically scale linearly in the number of threads or workers

roggirg commented 2 years ago

Ah cool, thanks @eugenevinitsky @xiaomengy . I played around with n_rollout_threads=4 (did not know of its existence) and the FPS jumped up to ~50ish. FYI, I'm running on a 1080Ti with a 12 -core CPU. Thanks for your help.

eugenevinitsky commented 2 years ago

We're going to re-open this because that's a good deal slower than we expect it to be. @xiaomengy, any chance you could run the line python examples/on_policy_files/nocturne_runner.py algorithm=ppo algorithm.n_rollout_threads=4 and report the FPS? I don't have GPU access for a little while so I can't check it myself.

facebookresearch / nocturne

Issues Running MAPPO #38