ZHU-Zhiyu / NVS_Solver

Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"
257 stars 7 forks source link

Question about the denoising loop of svd_interpolate_dyn_img.py #18

Open ZJWdk opened 4 months ago

ZJWdk commented 4 months ago

Hello, thanks for your excellent work! I have some questions about the denoising loop in svd_interpolate_dyn_img.py.

In the denoising loop, unet and scheduler are called twice. Also, at the first time you divide this process into four, and the tensors are sliced. Would you please explain the purpose of each time you call for unet and scheduler? And why do you choose to slice these tensors in this way? https://github.com/ZHU-Zhiyu/NVS_Solver/blob/8cdf310e459c07b9c64b4b20e79254586dbe8267/svd_interpolate_dyn_img.py#L553-L603 https://github.com/ZHU-Zhiyu/NVS_Solver/blob/8cdf310e459c07b9c64b4b20e79254586dbe8267/svd_interpolate_dyn_img.py#L608-L640

mengyou2 commented 4 months ago

Hi, thanks for your interest in our project.

The first call is to compute the backward gradient in eq.14 and the second one is to compute the denoised result. The purpose of dividing the first process into four patches is to save computing memory, as we use 48GB GPU. If you have a 80GB GPU, you can discard the patch division.

ZJWdk commented 4 months ago

I got it. Thank you!

dadwadw233 commented 4 months ago

Is it possible to run the code on a 24GB GPU(like RTX 4090)?

mengyou2 commented 4 months ago

The code now split one image into 4 patches, you may try more patches, such as 8, to reduce the computing memory.

ZJWdk commented 4 months ago

Hello, I've got another problem here. In your code here, , you calculate $\Delta p$ as tau_ - tau_p, but I don't know why. You didn't explain that in your paper. Also you don't calculate $\lambda_t$ according to Equation(18) in your paper. Why is the difference and what's the meaning of k and b here? Hope for your kind explanation. Thank you. https://github.com/ZHU-Zhiyu/NVS_Solver/blob/af820d19bb1b71a3d40a8cd0bcdf2cd2d6f05ce3/svd_interpolate_dyn_img.py#L1119-L1130

mengyou2 commented 4 months ago

For tau_ - tau_p, it's because that we generate each pose on the trajectory evenly.

we add k,b as hyperparameter for a better search of suitable $\lambda_t$. Adjusting k,b can help us find a good $\lambda_t$ .