Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
322 stars 33 forks source link

Question regarding imagination process #272

Closed anthony0727 closed 7 months ago

anthony0727 commented 7 months ago

I can't fully understand the imagination process in https://github.com/Eclectic-Sheep/sheeprl/blob/main/notebooks/dreamer_v3_imagination.ipynb,

from the "context" of beginning of imagination

    if i == initial_steps - imagination_steps - 1:
        stochastic_state = player.stochastic_state.clone()
        recurrent_state = player.recurrent_state.clone()

the imagination is performed

        # imagination step
        imagined_stochastic_state, recurrent_state = world_model.rssm.imagination(
            stochastic_state, recurrent_state, actions
        )

but doesn't stochastic_state_{t-1} have to be fed into world_model.rssm.imagination to output stochastic_state_{t}? i.e.

        # imagination step
        stochastic_state, recurrent_state = world_model.rssm.imagination(
            stochastic_state, recurrent_state, actions
        )

Really appreciate if you could help me understand this process!

belerico commented 7 months ago

Hi @anthony0727, the notebook has been thought like this. Supopose that we want our agent to play for 200 sptes and while imagining for 45 steps (as by default in the notebook). Our final objective is to compare how the imagination differs from the real behaviour: in our example we want to compare the last 45 steps. So:

  1. First the agent plays for initial_steps while saving everything in the rb_initial buffer
  2. At initial_steps - imagination_steps - 1-th step we save the recurrent and stochastic states: this will be used as a starting point for the imagination
  3. Then we imagine for imagination_steps. During this time one can choose to really imagine actions or take the already played ones, with those actions that are used to compute the next stochastic and recurrent state from the world model, which are then used to reconstruct the image from the decoder. At the same time we reconstruct the image from the stochastic and recurrent states really played, so that we can also compare the reconstruction of the frames played

Is it more clear now?

anthony0727 commented 7 months ago

Thanks for the reply!

but my qusestion is,

why isn't "next" stochastic used for next next stochastic like shown in behavior learning? https://github.com/Eclectic-Sheep/sheeprl/blob/2bae37985d789a67d569bf37ed937b9445ae9ab8/sheeprl/algos/dreamer_v3/dreamer_v3.py#L236

vs

        # imagination step
        imagined_stochastic_state, recurrent_state = world_model.rssm.imagination(
            stochastic_state, recurrent_state, actions
        )
belerico commented 7 months ago

You're super right! Thank you both @anthony0727 and @michele-milesi for spotting that! I thought i was going blind! I'll fix it up right now

belerico commented 7 months ago

Opened here a new branch: can you try it out?

anthony0727 commented 7 months ago

Yup I think the fix is correct! We can close this issue!