Stability-AI / generative-models

Generative Models by Stability AI
MIT License
23.14k stars 2.56k forks source link

Large memory requirement when implementing sds loss using SV3D #356

Open yifliu3 opened 1 month ago

yifliu3 commented 1 month ago

Dear devs,

Thanks for open-sourcing this great work! I am trying to implement the 3D reconstruction part in the paper, but I find problems when implementing sds loss. As sds loss is computed on the latents, I have to retain the grad of the vae encoder, which is extremly memory expensive (about 0.4~0.5GB per frame, 21 frames in total). It seems 80-100GB GPU memory to cost, which is hard to implement using the common GPU. So I'm wondering if you have any tricks to reduce the memory? Thanks a lot.

pengc02 commented 1 month ago

same question

fengq1a0 commented 2 weeks ago

Dear devs,

Thanks for open-sourcing this great work! I am trying to implement the 3D reconstruction part in the paper, but I find problems when implementing sds loss. As sds loss is computed on the latents, I have to retain the grad of the vae encoder, which is extremly memory expensive (about 0.4~0.5GB per frame, 21 frames in total). It seems 80-100GB GPU memory to cost, which is hard to implement using the common GPU. So I'm wondering if you have any tricks to reduce the memory? Thanks a lot.

I have the same question. Do you have any solutions?

And I'm still confused about some details in SDS with SV3D. I think we should render 21 images of my 3D representation, add noise, and denoise them with SV3D. However, in the paper, it's written that "We sample a random camera". Is it possible to add noise and denoise on a single image? I believe a Temporal Attention trained on 21 frames won't work well on fewer frames(like 4-5 frames). So, do you have any tricks? Thanks.