Open Centurion-Rome opened 1 year ago
It's not about larger images. You can fix the larger images by enabling the high resolution checkbox.
CLIP guidance is a slower process that uses CLIP every frame and is more about helping it follow the prompt in more detail than maintaining coherence at large sizes. CLIP guidance gets stable diffusion a lot closer to Dall-E 2 in terms of correctly understanding prompts (which isn't perfect, but it's better).
So how do you integrate this with the interface code? The filenames don't line up quite so I cannot simply copy and paste the code changes mentioned.
Was CLIP Guidance ever implemented into Automatic1111?
Worth noting that implementing native CLIP Guidance would allow for dramatic improvements to outpainting, see https://www.reddit.com/r/StableDiffusion/comments/ysv5lk/outpainting_mk3_demo_gallery/
Any new developments on this by chance?
Hi I implemented the code from Birch-san's repository into webui, I dont know anything about the underlying math but it seems to work okay
https://github.com/space-nuko/stable-diffusion-webui/tree/feature/clip-guidance
Note that even for a single image it is very slow and some people recommend >50 steps for best results. Also note that this implementation only works when batch_size=1
Also for the record, I think this would be very difficult to be made into an extension since it requires modifications to how the stable diffusion samplers work
Some examples I made w/Euler a sampling, 50 steps
No CLIP guidance:
ViT-B-16-plus-240, pretrained=laion400m_e32
, CLIP guidance scale=200:
roberta-ViT-B-32, pretrained=laion2b_s12b_b32k
, CLIP guidance scale=250:
ViT-B-32, pretrained=laion2b_s34b_b79k
, CLIP guidance scale=200:
ViT-B-32, pretrained=laion2b_s34b_b79k
, CLIP guidance scale=300:
ViT-B-32, pretrained=laion2b_s34b_b79k
, CLIP guidance scale=400:
Hi I implemented the code from Birch-san's repository into webui, I dont know anything about the underlying math but it seems to work okay
https://github.com/space-nuko/stable-diffusion-webui/tree/feature/clip-guidance
Note that even for a single image it is very slow and some people recommend >50 steps for best results. _Also note that this implementation only works when batchsize=1
Also for the record, I think this would be very difficult to be made into an extension since it requires modifications to how the stable diffusion samplers work
Some examples I made w/Euler a sampling, 50 steps
I installed it but am getting constant CUDA out of memory issues. I reduced the CLIP Guidance to 50, just in case, but it made no difference. Ex:
RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 8.00 GiB total capacity; 7.22 GiB already allocated; 0 bytes free; 7.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
This is using an RTX 2080 Super with 8GB VRAM.
Yeah I think the VRAM requirements are just really high, I dont remember it taking less than 16GB for me with xformers enabled
Part of the reason is I had to turn off checkpointing for it to work, thats a feature that saves VRAM but cant be used with some torch features apparently (torch.grad.autograd()
in this case). I dont know if it just has to be implemented like that or if theres another way that an actual ML whiz could figure out
Do we have ability to run this into 8gb vram?
Is this available as an extension or is it a full fork?
Its a fork for now, I had to make some changes to the original code to get it to work correctly, also Im still trying to figure out how to improve the performance
As soon as you have a possible way to work with 8GB VRAM, drop a note here and I will gladly help test.
On Fri, Dec 30, 2022 at 3:44 PM space-nuko @.***> wrote:
Its a fork for now, I had to make some changes to the original code to get it to work correctly, also Im still trying to figure out how to improve the performance
— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2738#issuecomment-1368048253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3MNV4XJ6RVPBVCNMEONKDWP4UPXANCNFSM6AAAAAARF6FQ6Q . You are receiving this because you commented.Message ID: @.***>
I hope this can be installed as an extension, so we don't use a fork - or implemented directly in A1111.
Is your feature request related to a problem? Please describe. sometimes bigger images are not coherent
Describe the solution you'd like See idea behind thi post https://www.reddit.com/r/StableDiffusion/comments/y4fekg/dreamstudio_will_now_use_clip_guidance_to_enhance/