Open yifliu3 opened 6 months ago
same question
Dear devs,
Thanks for open-sourcing this great work! I am trying to implement the 3D reconstruction part in the paper, but I find problems when implementing sds loss. As sds loss is computed on the latents, I have to retain the grad of the vae encoder, which is extremly memory expensive (about 0.4~0.5GB per frame, 21 frames in total). It seems 80-100GB GPU memory to cost, which is hard to implement using the common GPU. So I'm wondering if you have any tricks to reduce the memory? Thanks a lot.
I have the same question. Do you have any solutions?
And I'm still confused about some details in SDS with SV3D. I think we should render 21 images of my 3D representation, add noise, and denoise them with SV3D. However, in the paper, it's written that "We sample a random camera". Is it possible to add noise and denoise on a single image? I believe a Temporal Attention trained on 21 frames won't work well on fewer frames(like 4-5 frames). So, do you have any tricks? Thanks.
Hello yifliu3,
I have the same issue, may I ask how did you resolve it?
Thank you!
Dear devs,
Thanks for open-sourcing this great work! I am trying to implement the 3D reconstruction part in the paper, but I find problems when implementing sds loss. As sds loss is computed on the latents, I have to retain the grad of the vae encoder, which is extremly memory expensive (about 0.4~0.5GB per frame, 21 frames in total). It seems 80-100GB GPU memory to cost, which is hard to implement using the common GPU. So I'm wondering if you have any tricks to reduce the memory? Thanks a lot.