Closed anthony0727 closed 1 month ago
3) Also -0.5 is missing here. This is quite critical in the test phase. when testing with the notebook, I couldn't get similar return seen at training, and this was the main cause
preprocessed_obs[k] = preprocessed_obs[k] / 255.0 - 0.5
in
preprocessed_obs[k] = torch.as_tensor(v[np.newaxis], dtype=torch.float32, device=fabric.device)
if k in cfg.algo.cnn_keys.encoder:
preprocessed_obs[k] = preprocessed_obs[k] / 255.0
mask = {k: v for k, v in preprocessed_obs.items() if k.startswith("mask")}
4) shouldn't below fixes should be made to align buffers' timeline? I'm a bit confused in this though. I think -1 for actions is correct,
# actions actually played by the agent
actions = torch.tensor(
rb_initial["actions"][-imagination_steps + i - 1], # (anthony) subtract 1
device=fabric.device,
dtype=torch.float32,
)[None]
I think the original -imagination_steps + i
is correct with -1 for actions, but instead I got desired trajectory with -imagination_steps + i + 1
... confusing
# reconstruct the observations from the latent states used when interacting with the environment
played_latent_states = torch.cat(
(
torch.tensor(rb_initial["stochastic_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
torch.tensor(rb_initial["recurrent_state"][-imagination_steps + i + 1], device=fabric.device), # (anthony) add 1
),
-1,
)
With the changes above, now I got the desired performance & results!
Hi @anthony0727,
2.
, we modified Dreamer and these two modifications escaped us.+1
is correct because there is a mismatch between actions and latent states (due to the inclusion of the initial latent state in the buffer). The problem with +1
is that the last action is the one in position 0
. I will fix these problems as soon as possible, thanks for reporting these.
EDIT:
i
are generated by the observation at index i-1
and are used to generate the action at index i-1
. This is because the initial latent state is buffered. I will modify it because it only creates confusion.@anthony0727 @michele-milesi thank you both!
Effectively we save the stochastic
and recurrent
state that refers to the previous action instead of the one that has been generated
Hi @anthony0727, we created a branch for fixing this issue, can you check if it works? (https://github.com/Eclectic-Sheep/sheeprl/tree/fix/dv3-imagination-notebook)
Thanks
I'll soon look into it & new dv3 after my deadline thing!!
I can't find a button to reopen the issue, regarding #272, I have two questions!
1) shouldn't initial
stochastic_state
be flattened s.t.in
2) shouldn't 0.5 also be added to imagined reconstructed obs? s.t.
in