danijar / director

Deep Hierarchical Planning from Pixels
https://danijar.com/director/
86 stars 22 forks source link

How to reproduce fig. A.1 #8

Open Cmeo97 opened 1 year ago

Cmeo97 commented 1 year ago

Hi Danijar,

Reading the appendix of Director I couldn't understand what you mean by providing the reward to the worker. Is there a config I can use to do that? In the description of the figure you write: " When additionally providing task reward to the worker", does it mean that you change the context variable defined in hierarchy.py and include the reward as well? Also, if it works so well, why don't you do that by default? Have you tried to do the same for other tasks as well (i.e. Ant Mazes)?

Thank you so much!

Bests, Cristian

jdubkim commented 1 year ago

It is set by default. You can see the config in this line of configs.yaml. worker_rews: {extr: 1.0, expl: 0.0, goal: 1.0}

Cmeo97 commented 1 year ago

I’m not sure if I got it correctly. The default config file has worker_rews: {extr: 0.0, expl: 0.0, goal: 1.0}. Besides, by using this line, I think what would change is that the worker would have an additional critic, but I still don’t get how the task reward would be provided to the worker. Also, if it works better for most of tasks that don’t have sparse reward profiles, why don’t use this config in general? Does it lead to worst performances when it comes to envs like Ant Maze?

thank you so much!

jdubkim commented 1 year ago

Oh my bad. I think I have changed it in my local repo. If you change extr to non-zero, then I think the worker takes the extrinsic reward from the world model? It's in hierarchy.py. Also, the reason why the default is set to 0 is due to the intention of the design I guess. In section 2.4 in the paper, it says "We make this design choice to demonstrate that the interplay between the manager and the worker is successful acorss many environments, although we also include ...". If worker takes task reward, it can be thought of as cheating because ideally the worker policy should be rewarded based on if it reached the goal or not.

Cmeo97 commented 1 year ago

Thank you so much for your answer! Could you expand a bit on the reason why it would be like cheating? Why can't we give to both manager and worker access to the extrinsic rewards? Conceptually speaking, although the manager is supposed to have a higher level perspective, and therefore should be able to access the extrinsic rewards, I can't see why the worker shouldn't as well. Could reference me some paper where such thing is mentioned, or expand on this? Thank you so much!