Closed robjlyons closed 3 years ago
Plan2Explore is implemented in this code base via --expl_behavior plan2explore
. The task policy will still be trained on the rewards from the environment but it will only be used for computing eval scores and not for data collection. You can also set --expl_until 1e6
if you want to switch to collecting data via the task policy after 1M steps. By default, the exploration policy uses no external rewards, but there is a config for that, too. Check out the exploration section in configs.yaml
.
Edit: For future reference, the correct expl_behavior flag is --expl_behavior Plan2Explore
, others can find the implementation in expl.py Sorry for the noise!
Is it possible to add the use of intrinsic rewards to this method?
Thanks