danijar / dreamer

Dream to Control: Learning Behaviors by Latent Imagination
https://danijar.com/dreamer
MIT License
513 stars 110 forks source link

Dreamer for Atari #33

Closed michaelzhiluo closed 3 years ago

michaelzhiluo commented 4 years ago

In short, here's the bug when I ran atari_breakout:

  File "dreamer.py", line 463, in <module>
    main(parser.parse_args())
  File "dreamer.py", line 443, in main
    functools.partial(agent, training=False), test_envs, episodes=1)
  File "/home/mluo/dreamer/tools.py", line 124, in simulate
    obs, _, done = zip(*[p()[:3] for p in promises])
  File "/home/mluo/dreamer/tools.py", line 124, in <listcomp>
    obs, _, done = zip(*[p()[:3] for p in promises])
  File "/home/mluo/dreamer/wrappers.py", line 350, in step
    obs, reward, done, info = self._env.step(action)
  File "/home/mluo/dreamer/wrappers.py", line 162, in step
    obs, reward, done, info = self._env.step(action)
  File "/home/mluo/dreamer/wrappers.py", line 211, in step
    obs, reward, done, info = self._env.step(action)
  File "/home/mluo/dreamer/wrappers.py", line 320, in step
    raise ValueError(f'Invalid one-hot action:\n{action}')
ValueError: Invalid one-hot action:
[ 0.999  -0.9995  0.9995  0.9995] 

I was wondering what changes are needed to get atari to work in your much cleaner Dreamer codebase and what possible hyperparameter changes would be needed to match the results reported in the paper.

IcarusWizard commented 4 years ago

Same problem as #29.

michaelzhiluo commented 4 years ago

Ty! Does Atari learn well (replicate results) with the hyperparameters in dreamer.py?

IcarusWizard commented 4 years ago

Sorry, I didn't fully run the atari experiment, since I don't have enough resource to run it 😟 (by calculation, it needs roughly 1T RAM and weeks of training on my environment). If you have enough resource and want to replicate the results, I suggest you to try the parameters in Appendix A of the paper. My setting is --expl epsilon_greedy --horizon 10 --kl_scale 0.1 --action_dist onehot --expl_amount 0.4 --expl_min 0.1 --expl_decay 100000 --pcont 1 --time_limit 1000000. Here time_limit is set to be large enough to prevent early stop of rollout in atari environment. You may also need to change the hidden size of the network as mentioned by Danijar in #7. Good Luck!

xlnwel commented 3 years ago

DreamerV2 for Atari games is out. Check this repo: https://github.com/danijar/dreamerv2