Drexubery / ViewCrafter

Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"
Apache License 2.0
752 stars 24 forks source link

Question about reconstruction strategy in iterative novel view synthesis #21

Open tsuJohc opened 1 week ago

tsuJohc commented 1 week ago

Sorry for opening a new issue because I afraid you could not find my follow-up question well.

Could I ask for more details?

If we already have a initial Dust3R reconstruction on input sparse views, what is the strategy of reconstruction for Dust3R and alignment in progress?

(1) One way I think is to input all input sparse views and all generated views to Dust3R and output the global aligned scene at each step.However, this way will be prohibitively expensive in GPU resources when the number of images becomes more than 30.

(2) Another way seems to establish a initial Dust3R reconstruction as a global reconstruction and reconstruct newly generated 25 novel views as a local one. However, in order to align the global one and local one, the correspondences is needed. How do you solve the correspondences?

Thanks!

zzhang2816 commented 1 week ago

What is the difference between nvs_single_view_ref_iterative function and nvs_single_view_1drc_iterative fucntion?

Drexubery commented 1 week ago

Hi, sorry for the late response. We run DUSt3R for all existing and generated views at each step. Instead of using all 25 frames, we sample 3 or 5 views from them for reconstruction.

Drexubery commented 1 week ago

Also the nvs_single_view_ref_iterative function and nvs_single_view_1drc_iterative fucntion seems to be identical.

The nvs_single_view_ref_iterative function performs view synthesis that always start from the reference image. For example, the first step involves a left-turning trajectory that starts from the reference image, and all subsequent steps also starts from the reference image. In contrast, the nvs_single_view_1drc_iterative function uses the last frame of the generated novel view as the starting point for the next step.

saetlan commented 3 days ago

Also the nvs_single_view_ref_iterative function and nvs_single_view_1drc_iterative fucntion seems to be identical.

The nvs_single_view_ref_iterative function performs view synthesis that always start from the reference image. For example, the first step involves a left-turning trajectory that starts from the reference image, and all subsequent steps also starts from the reference image. In contrast, the nvs_single_view_1drc_iterative function uses the last frame of the generated novel view as the starting point for the next step.

From my understanding of the code, the iterative process will generate 25 frames 25 times right? (Since the video model generates 25 frames each iteration) But each time we replace the point cloud projection with the new inpainted view that was generated at the previous iteration. Did I understand the process correctly?