Open sixiaozheng opened 1 week ago
Hi, thanks for your interest in our work.
We use DUSt3R to process a video clip of 25 frames, then the camera pose and point cloud of every frame can be obtained.
For your test video, you can pass the video frame (must be 25 frames) folder into run_sparse.sh
and delete this line https://github.com/Drexubery/ViewCrafter/blob/f55d64bbd54cad6a1a1a72610b189d15b9926c87/utils/pvd_utils.py#L236
Then select the frame you want through a simple index operation here https://github.com/Drexubery/ViewCrafter/blob/f55d64bbd54cad6a1a1a72610b189d15b9926c87/viewcrafter.py#L64
Then it should produce a render result align with your test video.
I would like to express my sincere appreciation for your impressive work. The approach and results presented in your paper are inspiring, especially the generated videos that align well with the input sequences.
I have a question regarding reproducing the video generation process using RealEstate10K as depicted in Fig. 3 of your paper. Specifically, I would like to know how I can take the first frame of a RealEstate10K video and the corresponding camera pose sequence as input, render the sequence of frames, and then use the diffusion model to generate the final video.
Could you provide some guidance or example code on how to proceed with this pipeline?