OpenDriveLab / MPI

[RSS 2024] Learning Manipulation by Predicting Interaction
https://opendrivelab.com/MPI/
MIT License
61 stars 0 forks source link

Some question about prediction Transformer? #2

Open Zhangwenyao1 opened 2 weeks ago

Zhangwenyao1 commented 2 weeks ago

Thanks for your great work!

And I want to know what is target frame prediction? Does it mean that you will reconstruct the whole images under the condition of the other two images? Is there any visualiztion of reconstructed frames?

retsuh-bqw commented 2 weeks ago

Thanks for your interest in our work!

Yes, we select either the transitional or final frames as target frames for reconstruction, conditioned by the other two frames and the task description.

As a representation learning framework, MPI does not prioritize the visual quality of reconstructed images. And, consequently, we do not include qualitative results in our paper.