Closed Yuxuan-W closed 2 months ago
Hi,
The videos with mouse motion input shown on the website is generated by simulations. The reference videos in the Figure-2 of the paper is not conditioned on mouse motion input. It's generated by image to video diffusion model. It does not need to be conditioned on mouse motion input. After distilling the knowledge of the reference videos through inverse simulation, we can generate new motions of the same object.
Thanks for this amazing work! I tried to generate the reference videos from a static image
on stablevideo.com (which is powered by stable video diffusion). But I got some wierd videos. https://www.stablevideo.com/generate/fde88301-fa03-4f7e-8f14-ee1bdff0e271
Is there any tricks you use to generate good videos?
Thanks for you inspiring work! Since the dynamic 3DGS are trained from reference videos, I am curious about how your reference videos are generated given a single-view image and a mouse motion input.