Closed frankjiang closed 10 months ago
same issue
same issue too. env : m1 mac
Same here. IP-Adapter been buggy and can't get it to work
Same here. M1 Mac 8GB, Sonoma 14.1.1.
Information that might be related: Sonoma has previously caused an fp16-related issue with NeuralNet on PyTorch 2.1.0, but that particular problem was solved by updating to 2.2.0.dev20231012. (Issue AUTOMATIC1111/stable-diffusion-webui#13419)
Attempted solutions: Launching SD with --no-half "fixes" the problem by forcing all fp16 values into fp32, but it also slows down each iteration by 8-12 times (from 2 to 16-20 seconds, in my case). UPD: Tried enabling the "Upcast cross attention layer to float32" option in Settings -> Stable Diffusion. Didn't work.
Same here. M1 Max
This works for me:
Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)
to convert ip_k & ip_v from float to c10:Half by adding .half() for each.
Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.
This works for me:
Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)
to convert ip_k & ip_v from float to c10:Half by adding .half() for each.
Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.
Anyone verify this solution on their Mac? I do not have an MacOS machine to verify this patch. I will merge this patch to main branch once it is verified.
This works for me: Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False) to convert ip_k & ip_v from float to c10:Half by adding .half() for each. Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.
Anyone verify this solution on their Mac? I do not have an MacOS machine to verify this patch. I will merge this patch to main branch once it is verified.
I can't compare the results to an Nvidia machine, so I'm going to post a detailed report with image samples just in case this fix caused some weirdness that I can't detect.
My apologies if this response is a bit long; I'd rather be thorough than miss something that an Nvidia owner would notice.
TL;DR:
1) Tested on txt2img and img2img. Didn't find any issues. 2) Outputs in both modes are highly accurate and reproducible. 3) The slowdown due to IPAdapter seems to be within 15% of the original s/it value.
Testing parameters:
Processor: M1 8GB.
OS: Sonoma 14.1.1.
PyTorch version: 2.2.0.dev20231012
Webui arguments on launch: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --use-cpu interrogate
.
Resolutions: 512x512 and 512x768.
IPAdapter settings: ip-adapter_clip -> ip-adapter-plus-face_sd15, Low VRAM, Control Weight 0.7, Steps 0.5-1.0.
Attaching XY grids below to display the results.
Model: Deliberate v2.
Sampler: DPM++ 2M Karras, sampling steps: 20.
Prompt: female nurse, black hair.
Negative prompt: nsfw, disfigured, (deformed), ugly, saturated, doll, cgi, calligraphy, mismatched eyes, poorly drawn, b&w, blurry, missing, ((malformed)), ((out of frame)), model, letters, mangled, old, surreal, ((bad anatomy)), ((deformed legs)), ((deformed arms)).
IPAdapter image:
1) 512x512. No issues. Average time per iteration: 1.555 s/it without ControlNet, 1.6 s/it with IPAdapter.
2) 512x768. No issues. Average time per iteration: 2.75 s/it without ControlNet, 2.965 s/it with IPAdapter
3) Reproducibility test: generating from the same seed three times, IPAdapter turned on, to see if outputs will differ from each other. No issues.
4) img2img test (using only one seed, testing for accuracy and reproducibility at the same time). No issues.
@Osato28 So the fix works for you too, right? Do you spot anything weird in your generations?
Your generations look pretty cool to me. I'm bad in tuning settings for nice outputs...
If the output does work for Apple Silicon, my only concern is about the --upcast-sampling, --no-half settings, etc. I have a feeling they are related to the error. simply typecasting by .half() might break users not using Apple Silicon. I only have a M1 Max, so unable to test for other PC / GPU / CPU...
By the way, My COMMANDLINE_ARGS is:
"--skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --medvram --use-cpu Interrogate --no-half-vae --disable-safe-unpickle --autolaunch",
which I thought is optimized for Apple Silicon
@beltonk I didn't spot anything weird and I can't test it on non-Apple Silicon.
Hence the overly detailed test results: I'm hoping that if there is anything weird, it will be caught by someone with a more traditional GPU.
Thank you for posting that fix, by the way. I couldn't make heads or tails of how IPAdapter worked, and I didn't have the courage to blindly typecast values until the error message went away.
Offtopic:
1) Prettiness is not due to prompt engineering but due to the model, Deliberate v2. It's as stable and balanced as models get: it would probably give better results with a shorter negative prompt, I just stopped optimizing that prompt halfway.
2) As for COMMANDLINE_ARGS, I simply kept the most minimal set that prevented crashes and kept performance reasonably high. I didn't optimize it besides that. --medvram
does seem to improve performance with heavier ControlNet models, though; added it to my args, thank you.
But I'm afraid that both of those discussions are outside the scope of this issue.
If you wish to initiate testing on several Apple Silicon machines to find an optimal set of COMMANDLINE_ARGS, I think it would be better to start a separate discussion issue in the main AUTOMATIC1111 repo.
Thank you @beltonk -- your fix worked for me too!
Thx @beltonk -- works for me as well!
The fix of @beltonk worked for me on m2 mac mini
Worked here! (Mac, M2 / 1111 v 1.7)
This works for me:
Patching https://github.com/Mikubill/sd-webui-controlnet/blob/main/scripts/controlmodel_ipadapter.py#L430 to ip_out = torch.nn.functional.scaled_dot_product_attention(q, ip_k.half(), ip_v.half(), attn_mask=None, dropout_p=0.0, is_causal=False)
to convert ip_k & ip_v from float to c10:Half by adding .half() for each.
Although I'm not sure if this is the right thing to do, I'm able to generate images with SD 1.5 and SDXL with style transfer using ControlNet + IP Adapter.
which file should I change? Cant find controlmodel_ipadapter.py
Thank you,@huchenlei https://github.com/Mikubill/sd-webui-controlnet/pull/2348
Is there an existing issue for this?
What happened?
IPAdapter cannot run correctly.
Steps to reproduce the problem
What should have happened?
raise a RuntimeError
Commit where the problem happens
webui: 5ef669de080814067961f28357256e8fe27544f4 controlnet: 3011ff6e706d3fdd0cc7d2ac8ff0d59020b8f767
What browsers do you use to access the UI ?
No response
Command Line Arguments
List of enabled extensions
Console logs
Additional information
Also occurs in other ip-adapter models, e.g. ip-adapter-plus_sd15 [c817b455]