AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.84k stars 27.05k forks source link

[Feature Request]: Kandinsky 2.1 inherits best practicies from Dall-E 2 and Latent diffusion, while introducing some new ideas. #9493

Closed Gitterman69 closed 1 year ago

Gitterman69 commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

https://github.com/ai-forever/Kandinsky-2

Kandinsky 2.1 inherits best practicies from Dall-E 2 and Latent diffusion, while introducing some new ideas.

As text and image encoder it uses CLIP model and diffusion image prior (mapping) between latent spaces of CLIP modalities. This approach increases the visual performance of the model and unveils new horizons in blending images and text-guided image manipulation.

For diffusion mapping of latent spaces we use transformer with num_layers=20, num_heads=32 and hidden_size=2048.

Other architecture parts:

Text encoder (XLM-Roberta-Large-Vit-L-14) - 560M Diffusion Image Prior — 1B CLIP image encoder (ViT-L/14) - 427M Latent Diffusion U-Net - 1.22B MoVQ encoder/decoder - 67M Kandinsky 2.1 was trained on a large-scale image-text dataset LAION HighRes and fine-tuned on our internal datasets.

Proposed workflow

  1. Go to ....
  2. Press ....
  3. ...

Additional information

It is a latent diffusion model with two multilingual text encoders:

mCLIP-XLMR 560M parameters mT5-encoder-small 146M parameters These encoders and multilingual training datasets unveil the real multilingual text-to-image generation experience!

Kandinsky 2.0 was trained on a large 1B multilingual set, including samples that we used to train Kandinsky.

In terms of diffusion architecture Kandinsky 2.0 implements UNet with 1.2B parameters.

Kandinsky 2.0 architecture overview:

0-NiK-0 commented 1 year ago

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues?q=Kandinsky https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions?discussions_q=Kandinsky

user425846 commented 1 year ago

Is there any update on this?

FarisHijazi commented 1 year ago

I'm also interested in this, I might even work on integrating it, but are the authors interested in maintaining? @AUTOMATIC1111 please let us know before I open a PR