Interesting idea worth implementing: use CLIP guidance to enhance the quality and coherency of images

AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

GNU Affero General Public License v3.0

139.5k stars 26.44k forks source link

Interesting idea worth implementing: use CLIP guidance to enhance the quality and coherency of images #2738

Open Centurion-Rome opened 1 year ago

Centurion-Rome commented 1 year ago

Is your feature request related to a problem? Please describe. sometimes bigger images are not coherent

Describe the solution you'd like See idea behind thi post https://www.reddit.com/r/StableDiffusion/comments/y4fekg/dreamstudio_will_now_use_clip_guidance_to_enhance/

lendrick commented 1 year ago

It's not about larger images. You can fix the larger images by enabling the high resolution checkbox.

CLIP guidance is a slower process that uses CLIP every frame and is more about helping it follow the prompt in more detail than maintaining coherence at large sizes. CLIP guidance gets stable diffusion a lot closer to Dall-E 2 in terms of correctly understanding prompts (which isn't perfect, but it's better).

kybercore commented 1 year ago

Someone already implemented it apparently

https://github.com/Birch-san/stable-diffusion/compare/34556bc45211e0a1d3554ee0cd6795793889fbfb...1f13594371a31c417c8ce5f296fead63e43be337

https://twitter.com/Birchlabs/status/1578141960249876482

ASilver commented 1 year ago

So how do you integrate this with the interface code? The filenames don't line up quite so I cannot simply copy and paste the code changes mentioned.

slymeasy commented 1 year ago

Was CLIP Guidance ever implemented into Automatic1111?

aaronsantiago commented 1 year ago

Worth noting that implementing native CLIP Guidance would allow for dramatic improvements to outpainting, see https://www.reddit.com/r/StableDiffusion/comments/ysv5lk/outpainting_mk3_demo_gallery/

tchesket commented 1 year ago

Any new developments on this by chance?

space-nuko commented 1 year ago

Hi I implemented the code from Birch-san's repository into webui, I dont know anything about the underlying math but it seems to work okay

https://github.com/space-nuko/stable-diffusion-webui/tree/feature/clip-guidance

Note that even for a single image it is very slow and some people recommend >50 steps for best results. Also note that this implementation only works when batch_size=1

Also for the record, I think this would be very difficult to be made into an extension since it requires modifications to how the stable diffusion samplers work

Some examples I made w/Euler a sampling, 50 steps

No CLIP guidance: 42069-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-16-plus-240, pretrained=laion400m_e32, CLIP guidance scale=200: 42070-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

roberta-ViT-B-32, pretrained=laion2b_s12b_b32k, CLIP guidance scale=250: 42075-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-32, pretrained=laion2b_s34b_b79k, CLIP guidance scale=200: 42071-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-32, pretrained=laion2b_s34b_b79k, CLIP guidance scale=300: 42072-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-32, pretrained=laion2b_s34b_b79k, CLIP guidance scale=400: 42073-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ASilver commented 1 year ago

Hi I implemented the code from Birch-san's repository into webui, I dont know anything about the underlying math but it seems to work okay

https://github.com/space-nuko/stable-diffusion-webui/tree/feature/clip-guidance

Note that even for a single image it is very slow and some people recommend >50 steps for best results. _Also note that this implementation only works when batchsize=1

Also for the record, I think this would be very difficult to be made into an extension since it requires modifications to how the stable diffusion samplers work

Some examples I made w/Euler a sampling, 50 steps

I installed it but am getting constant CUDA out of memory issues. I reduced the CLIP Guidance to 50, just in case, but it made no difference. Ex:

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 8.00 GiB total capacity; 7.22 GiB already allocated; 0 bytes free; 7.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This is using an RTX 2080 Super with 8GB VRAM.

space-nuko commented 1 year ago

Yeah I think the VRAM requirements are just really high, I dont remember it taking less than 16GB for me with xformers enabled

Part of the reason is I had to turn off checkpointing for it to work, thats a feature that saves VRAM but cant be used with some torch features apparently (torch.grad.autograd() in this case). I dont know if it just has to be implemented like that or if theres another way that an actual ML whiz could figure out

Nyaster commented 1 year ago

Do we have ability to run this into 8gb vram?

azureprophet commented 1 year ago

Is this available as an extension or is it a full fork?

space-nuko commented 1 year ago

Its a fork for now, I had to make some changes to the original code to get it to work correctly, also Im still trying to figure out how to improve the performance

ASilver commented 1 year ago

As soon as you have a possible way to work with 8GB VRAM, drop a note here and I will gladly help test.

On Fri, Dec 30, 2022 at 3:44 PM space-nuko @.***> wrote:

Its a fork for now, I had to make some changes to the original code to get it to work correctly, also Im still trying to figure out how to improve the performance

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2738#issuecomment-1368048253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3MNV4XJ6RVPBVCNMEONKDWP4UPXANCNFSM6AAAAAARF6FQ6Q . You are receiving this because you commented.Message ID: @.***>

andupotorac commented 1 year ago

I hope this can be installed as an extension, so we don't use a fork - or implemented directly in A1111.