Drexubery / ViewCrafter

Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"
Apache License 2.0
810 stars 29 forks source link

Reproducing Video Generation from RealEstate10K as in Fig. 3 #29

Open sixiaozheng opened 1 week ago

sixiaozheng commented 1 week ago

I would like to express my sincere appreciation for your impressive work. The approach and results presented in your paper are inspiring, especially the generated videos that align well with the input sequences.

I have a question regarding reproducing the video generation process using RealEstate10K as depicted in Fig. 3 of your paper. Specifically, I would like to know how I can take the first frame of a RealEstate10K video and the corresponding camera pose sequence as input, render the sequence of frames, and then use the diffusion model to generate the final video.

Could you provide some guidance or example code on how to proceed with this pipeline?

Drexubery commented 4 days ago

Hi, thanks for your interest in our work.

We use DUSt3R to process a video clip of 25 frames, then the camera pose and point cloud of every frame can be obtained.

For your test video, you can pass the video frame (must be 25 frames) folder into run_sparse.sh and delete this line https://github.com/Drexubery/ViewCrafter/blob/f55d64bbd54cad6a1a1a72610b189d15b9926c87/utils/pvd_utils.py#L236 Then select the frame you want through a simple index operation here https://github.com/Drexubery/ViewCrafter/blob/f55d64bbd54cad6a1a1a72610b189d15b9926c87/viewcrafter.py#L64 Then it should produce a render result align with your test video.