I've noticed a potential misalignment in the p2e_dv2 and p2e_dv3 implementations regarding what the ensemble predicts. According to the Plan2Explore paper, the ensemble should predict the image embedding, not the posterior state. The implementation in p2e_dv1appears aligned with this:
loss -= next_obs_embedding_dist.log_prob(embedded_obs.detach()[1:]).mean()
However, in p2e_dv2and p2e_dv3, it seems to aim to predict the next (randomized) posterior state:
loss -= next_obs_embedding_dist.log_prob(posteriors.view(sequence_length, batch_size, -1).detach()[1:]).mean()
Could this be an intentional modification, or am I missing something about how these predictions should be handled?
I've noticed a potential misalignment in the p2e_dv2 and p2e_dv3 implementations regarding what the ensemble predicts. According to the Plan2Explore paper, the ensemble should predict the image embedding, not the posterior state. The implementation in
p2e_dv1
appears aligned with this:loss -= next_obs_embedding_dist.log_prob(embedded_obs.detach()[1:]).mean()
However, in
p2e_dv2
andp2e_dv3
, it seems to aim to predict the next (randomized) posterior state:loss -= next_obs_embedding_dist.log_prob(posteriors.view(sequence_length, batch_size, -1).detach()[1:]).mean()
Could this be an intentional modification, or am I missing something about how these predictions should be handled?