Hi, thank you for your wonderful work!
I have a question about world model training. If I look at world_model.py, I think you are masking the token which is the output of the tokenizer. Is the world model learning the masking problem? I think this is different from the normal world model training like presented in Dreamer, and so on.
Sincerely.
Hi, thanks for your interest! The world model is trained to autoregressively predict the tokens of the next frame, whereas DreamerV2 predicts the full discrete latent state in one shot.
Hi, thank you for your wonderful work! I have a question about world model training. If I look at world_model.py, I think you are masking the token which is the output of the tokenizer. Is the world model learning the masking problem? I think this is different from the normal world model training like presented in Dreamer, and so on. Sincerely.