chainer / chainerrl

ChainerRL is a deep reinforcement learning library built on top of Chainer.
MIT License
1.17k stars 224 forks source link

Add simpler examples #125

Open muupan opened 7 years ago

muupan commented 7 years ago

Current examples have many options, including which env to solve. That might be helpful to tackle a new environment, where you will have to tune hyper parameters, but as examples to new users, I feel they are too complicated. I think it is better to add simpler examples that only solves a single predefined env.

ElliotWay commented 7 years ago

In particular, do you have a good set of parameters for using ACER in ALE?

I've been running your example train_acer_ale.py with the default options, but it only gives me reduced sample efficiency compared to A3C (while taking noticeably longer).

Could be I've just been using the wrong games. Wang, et. al. unhelpfully only give an aggregate graph which doesn't give any information about, in which specific environments ACER is supposed to have improved sample efficiency.

muupan commented 7 years ago

@ElliotWay Interesting. Which game did you try? When I tuned train_acer_ale.py, I found it is much more sample-efficient than A3C on Breakout with the default parameters.

ElliotWay commented 7 years ago

@muupan Breakout, Beam Rider, Pong and Qbert. I guess if you did have better efficiency with Breakout, there must be something wrong with my setup - though I still get good results from A3C.

muupan commented 7 years ago

@ElliotWay Thank you. It is possible there has been some regression in ChainerRL. It should be investigated.