Kandinsky For Automatic1111 Extension

Adds a script that run Kandinsky 2.X models (2.1 and 2.2). Kandinsky 2.2 can generate larger images, but it is much slower to use with VRAM optimizations.

!!Note!! Progress bar not supported, view terminal progress bar instead.

Troubleshooting

Ignore the warning Pipelines loaded with torch_dtype=torch.float16 cannot run with cpu device... the Kandinsky model or prior is being moved to RAM to save VRAM.
NameError: name 'DiffusionPipeline' is not defined or any name error
- Usually happens after installation.
- Solution: Close Automatic1111 completely to finish installing, and open it again. The browser window may need to be refreshed.
AttributeError: 'KandinskyModel' object has no attribute 'ema_scope'
- The real error is probably CUDA out of memory above the AttributeError.
- Solution: In the script section, try reloading the stable diffusion model, and unloading it.

Examples

The following are non cherry-picked examples, with various settings and resolutions.

center image

Prompt: sky, daylight, realistic, high quality, in focus, 16k, HQ
Model: Kandinsky 2.1
Steps: 64
Sampler: Default
CFG Scale: 7
Prior CFG Scale: 7
Seed: 3479955
Size: 1024x1024
Inference Steps: 128

center image

Prompt: As the sun sets, les arbres whisper, mientras el río serpentea gracefully, отражая прекрасные colors, majestic mountains stand tall, evoking tranquillité et harmonie, 空中舞动着美丽的蝴蝶, 空と地球の神秘なつながり, रंगबिरंगी वस्तुएं। (from chatgpt)
In English: As the sun sets, the trees whisper, while the river gracefully meanders, reflecting beautiful colors, majestic mountains stand tall, evoking tranquility and harmony, butterflies dance in the air, the mysterious connection between sky and earth, colorful objects.
Model: Kandinsky 2.1
Steps: 64
Sampler: Default
CFG Scale: 7
Prior CFG Scale: 7
Seed: 3479955
Size: 768x768
Inference Steps: 128

center image

Prompt: cat, realistic, high quality, 4k
Model: Kandinsky 2.1
Steps: 64
Sampler: Default
CFG Scale: 7
Prior CFG Scale: 7
Seed: 3479955
Size: 1024x1024
Inference Steps: 128

center image

Prompt: spaceship, retro, realistic, high quality, 4k
Model: Kandinsky 2.1
Steps: 64
Sampler: Default
CFG Scale: 7
Prior CFG Scale: 7
Seed: 3479955
Size: 512x512
Inference Steps: 128

center image

Prompt: cyberpunk city, distopian, high quality, 4k
Model: Kandinsky 2.1
Steps: 64
Sampler: Default
CFG Scale: 3
Prior CFG Scale: 3
Seed: 3479955
Size: 768x768
Inference Steps: 128

Image Mixing

Combine images and/or prompts together. Can be used for style transfer, and combining a background with a subject.

Prompt: cat, high quality, 4k
Model: Kandinsky 2.1
Steps: 64
Sampler: Default
CFG Scale: 7
Prior CFG Scale: 7
Seed: 3479955494
Size: 1536x768
Inference Steps: 128

Mixed with:

center image

Result:

center image

How To Use

Select "Kandinsky" in the scripts section
Set "Prior Inference Steps". Increasing the value improves the results, but it reaches a plateau at around 128. Beyond that, the image may change, but the quality remains consistent.
The model will start downloading automatically, if needed.

Image Mixing

Prompt + Image

In text2img set the prompt
In the extra image field in the script section, set the image
Set the "Interpolate Image 1 Strength" to the desired amount of the image generated by the prompt
Set the "Interpolate Image 2 Strength" to the desired amount of the image in the script section

Image + Image

In img2img set an image
In the extra image field in the script section, set the image
Set the "Interpolate Image 1 Strength" to the desired amount of the image generated by the prompt
Set the "Interpolate Image 2 Strength" to the desired amount of the image in the script section

Notes

Prompt size is 512 tokens
Seeds are somewhat consistent across different resolutions
Changing sampling steps keeps the same image, while changing quality
The seed is not as important as the prompt, the subjects/compositions across seeds are very similar
It is very easy to "overcook" images with prompts, if this happens remove keywords or reduce CFG Scale
- Negative prompts aren't needed, so "low quality, bad quality..." can be ommited
- Short positive prompts are good, too many keywords confuse the ai

Features

Kandinsky 2.1
- Text to image
- Batching
- Img2img
- Inpainting
- Image mixing
- VRAM optimizations (16 bit float and attention slicing)
Kandinsky 2.2
- Text to image
- Batching
- VRAM optimizations (16 bit float and attention slicing)

Supported Settings

prompt
negative prompt
cfg scale
seed
width
height
sampling steps
denoising strength
batch count
batch size (only first image's seed can be replicated)
img2img image, and inpaint
inpaint at full resolution (needs fixing)

Any other settings such as seed variations, will have no effect on generated images.

Known Bugs

Potential memory leak when switching models, seems like a problem with [diffusers]{https://github.com/huggingface/diffusers/issues/2284}

Limitations

Uses the diffusers image generation pipeline to run Kandinsky (Only "kandinsky-community/kandinsky-2-1" is supported on Hugging Face, so no custom models)
No controlnet
No training
No support for other extensions like ultimate-upscale, tiled diffusion, etc.
No progress bar in GUI
No choice for samplers
Stable diffusion model and vae are not unloaded from ram, resulting in ~15gb ram usage
Not possible to replicate seed in batches
Strength of words in the prompt can't be set
Other automatic1111 features such as seed variations, hires fix, tiling, etc. are not supported
Can't be run with other automatic1111 scripts

MMqd / kandinsky-for-automatic1111

readme

Kandinsky For Automatic1111 Extension

Troubleshooting

Examples

Image Mixing

How To Use

Image Mixing

Prompt + Image

Image + Image

Notes

Features

Supported Settings

Known Bugs

Limitations