ZHU-Zhiyu / NVS_Solver

Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"
189 stars 1 forks source link

Code question #15

Open mycfhs opened 5 days ago

mycfhs commented 5 days ago

Hello, I'm trying to understand your paper through code. Would you like to tell me where are the parts of the code corresponding to equations (12) and (17)?

mengyou2 commented 4 days ago

Hi, thanks for your interest in our project.

Eq.12 means that in some regions of the image, the pixels come from the warped image, while in other regions, the pixels are directly generated from the diffusion model. In latent space, this means, in some regions of the image, the feature come from temp_cond_latents and other generated from diffusion model. The variable λ is used to compute the mask that represents the regions top_masks where the temp_cond_latents is used. https://github.com/ZHU-Zhiyu/NVS_Solver/blob/e8433d60a01eccf7cd967281ec42911aa843c4f2/src/diffusers/schedulers/scheduling_euler_discrete.py#L930-L958

The code implemented the solution Eq.18 of Eq.17 in: https://github.com/ZHU-Zhiyu/NVS_Solver/blob/e8433d60a01eccf7cd967281ec42911aa843c4f2/svd_interpolate_single_img.py#L1110-L1145