danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
886 stars 195 forks source link

Questions on Imagination MDP and imagination horizon H = 15 #42

Closed GoingMyWay closed 1 year ago

GoingMyWay commented 2 years ago

Dear author,

After reading the code and the paper, I am confused about why Imagination MDP is introduced and why imagination horizon is needed. For example, with a trained world model and given a trajectory: $\tau$, we can sample an initial state and simulate a trajectory with the world model. In DreamerV2, each state in the sampled trajectory is used to simulate a sub-trajectory whose length is 15 and then used to update the policy. Why is your solution feasible for training model-based RL? It looks like magic. Could you help me to understand it?