the grad calculation takes up a lot of memory

SHI-Labs / Smooth-Diffusion

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024

https://shi-labs.github.io/Smooth-Diffusion/

MIT License

319 stars 8 forks source link

the grad calculation takes up a lot of memory #14

Open PanXiebit opened 6 months ago

PanXiebit commented 6 months ago

https://github.com/SHI-Labs/Smooth-Diffusion/blob/5522761bb68fcb6ac1cfaee5a5b855d4a56ea33f/train_smooth_diffusion.py#L312

grad, = autograd.grad(
        outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True
    )

The calculation of gradients is memory inefficient and lacks support for flast-attention. Consequently, when training with the reg_loss, it becomes necessary to reduce the batch_size.

Pakase commented 3 months ago

Hi, I wander how much the extra memory cost may be? Could you give me a rough number?