Jack000 / glid-3-xl-stable

stable diffusion training
MIT License
291 stars 36 forks source link

The advantage 👍 of this network arch #3

Open TabuaTambalam opened 2 years ago

TabuaTambalam commented 2 years ago

Seems more vram efficient than original LDM/SD, On colab freetier T4, this can work with [1,4,104,112] latent (832x896 image) without cuda OOM, while the original can only work with [1,4,88,96] (704x768). Both under fp16.

The issues I encountered are: Without re-train, clip_proj is empty, and image_embed seems must be None. (otherwise some conv error.) So is it possible to use image_embed without re-train?

Orig LDM/SD has 6 other samplers from k-diffusion. You can see the minimal (zero extra dependency) ripoff of k-diffusion on my notebook: https://github.com/TabuaTambalam/DalleWebms/blob/main/docs/debugging/LDM_SR_jited.ipynb (My ripoff also get sigmas_karras and eta (ddim_eta) works unlike all other k-diffusion copypastas.)

Will this network arch get more samplers than plms&ddim in the future?

Also did you try JIT (torch.jit.trace()) on this network arch? JIT can help checking is there some weird pythonic things in the code. I followed Ailia's instructions https://github.com/axinc-ai/ailia-models/issues/830 , turned Orig LDM/SD into jit (the notebook above is it), wonder if this arch can also be JIT'd.

Jack000 commented 2 years ago

clip_proj should be removed. It was meant to project a (single) clip embedding to the DDPM timestep embedding dimension, to replicate GLIDE which was the original goal of this project. Stable diffusion doesn't use a clip embedding but instead the 77 token embeddings from the clip text encoder, so this key is not needed anymore.

image_embed is used to give the unet an image for conditioning (for inpainting or upscaling). For normal use it should be set to None.

the Latent diffusion code uses the OpenAI unet code directly, with only slight modifications. That's why it was so easy to just patch the original OpenAI repo to use the LDM models. In theory they should be identical, I'm not sure what could cause differences in performance.

I actually have no idea about other samplers, I'll have to look into it.

anyways, I'll update the code soon to resolve some of these issues. Currently busy training some new models.