evgiz / learning-rl

Essential methods in reinforcement learning (VPG, DQN, PPO)
MIT License
2 stars 0 forks source link

lunar lander question #1

Closed rapop closed 4 years ago

rapop commented 5 years ago

Hi, Did you use the same code for lunar lander? (LunarLanderContinuous-v2) Are the hyperparameters the same? I am trying to implement PPO also on lunar lander continuous and it did seems to converge

Thanks

evgiz commented 4 years ago

Hi,

Sorry for not seeing this earlier, just happened to notice the issue today. The lunar lander problem was solved using the same code as the bipedal walker (located at ./gym/bipedalwalker-v2/ppo.py). The same hyperparameters should converge in the lunar lander environment, but it is probably a good idea to experiment with a smaller network etc. since the problem is a bit easier in terms of the size of action- and state spaces.

If you have any other questions don't hesitate to reach out, but I'll close this issue for now!