danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
898 stars 195 forks source link

Intrinsic Rewards #14

Closed robjlyons closed 3 years ago

robjlyons commented 3 years ago

Is it possible to add the use of intrinsic rewards to this method?

Thanks

danijar commented 3 years ago

Plan2Explore is implemented in this code base via --expl_behavior plan2explore. The task policy will still be trained on the rewards from the environment but it will only be used for computing eval scores and not for data collection. You can also set --expl_until 1e6 if you want to switch to collecting data via the task policy after 1M steps. By default, the exploration policy uses no external rewards, but there is a config for that, too. Check out the exploration section in configs.yaml.

mjlbach commented 3 years ago

Edit: For future reference, the correct expl_behavior flag is --expl_behavior Plan2Explore, others can find the implementation in expl.py Sorry for the noise!