Closed ynagatomo closed 1 year ago
Thanks for the report @ynagatomo! I just looked into the required changes to support the XL version (which is not yet publicly available but some people started getting "research preview" access). Looks like we will need to support dual text encoders for inference and a few changes to the architecture code for conversion. I will look into the conversion changes first but contributions are welcome for any part of it!
The working memory demands of these larger models during inference, coupled with the 2 pass pipeline, makes these models a prime candidate for the new quantize option, if it could be implemented in the conversion process for these model types.
Thank you for your insight. Understood. There are a lot of challenges in conversion and inference, for the three times larger UNet (SD2.1: 865M => SDXL: 2.6B params), Two text encoder(CLIP ViT-L and OpenCLIP ViT-bigG), and added Pooled text embedding (OpenCLIP ViT-bigG). I'll close this issue. Thanks.
No need to close it! I will post updates here as we gradually support XL
Very glad to hear that. Thank you. :)
Hello everyone, I was able to convert and execute Stable Diffusion XL in my app Guernika.
I have open sourced the code for my Guernika Model Converter, here you can see the scripts I'm using to get this working.
I have also open sourced another repository with Schedulers, a few more than in this repo and all compatible with Stable Diffusion 2.X.
Hello everyone, I was able to convert and execute Stable Diffusion XL in my app Guernika.
I have open sourced the code for my Guernika Model Converter, here you can see the scripts I'm using to get this working.
I have also open sourced another repository with Schedulers, a few more than in this repo and all compatible with Stable Diffusion 2.X.
That's great! Slightly off topic but do you have plans to create a PR for the missing schedulers?
@godly-devotion I will probably not be making a PR for the missing schedulers, at most it would be a PR adding that repo as a dependency and using it on this repo but that should be pretty easy to do for anyone.
https://github.com/apple/ml-stable-diffusion/pull/217 should fix these unet errors, give it a try and lmk if you have any issues.
You'll need to add --convert-text-encoder-2
and --pineline-type SDXL
to your cli command, and change the attention to ORIGINAL
since split einsum doesn't work very well with the architecture by default.
Thank you for your PR217 on SDXL. I encountered an error shown below with your convert command, during converting vae_decoder. Any idea? Thanks in advance.
% git clone https://github.com/ZachNagengast/ml-stable-diffusion/ % git checkout -b sdxl-base-support origin/sdxl-base-support % pip install invisible-watermark : Successfully installed PyWavelets-1.4.1 invisible-watermark-0.2.0 opencv-python-4.8.0.74 % python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-text-encoder-2 --convert-vae-decoder --convert-vae-encoder --convert-unet --model-version stabilityai/stable-diffusion-xl-base-0.9 --compute-unit ALL --attention-implementation ORIGINAL --bundle-resources-for-swift-cli --pipeline-type SDXL -o CoreMLModels : INFO:main:Converting vae_decoder to CoreML.. Converting PyTorch Frontend ==> MIL Ops: 22% | ████████ | 80/369 [00:00<00:00, 2252.19 ops/s] |
---|
File "/Users/ynaga/miniforge3/envs/coreml38/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 199, in _get_inputs raise ValueError( ValueError: node hidden_states.7 (scaled_dot_product_attention) got 6 input(s), expected [7]
@ynagatomo Thanks for trying it out, I was curious if anyone else would have this issue. On my machine I was getting errors saying got 7 inputs(s), expected [6] so I added an override for scaled_dot_product_attention
. What you can do to get ride of this error since your machine doesn't have it, just comment out the line modify_coremltools_torch_frontend_scaled_dot_product_attention()
here and here. I'll try to get to the bottom of the issue.
Update: I'm not sure why I was getting 7 inputs still, but I modified the _get_inputs
to use the min_expected option and dropping any inputs after 6
@ZachNagengast Thank you for your comment. I commented out two modify_coremltools_torch_frontend_scaled_dot_product_attention() and now the conversion is in progress. Have you changed your latest push to handle this automatically?
Great to hear, yes commit is here https://github.com/apple/ml-stable-diffusion/pull/217/commits/bc373b2ed4d45ac78e71c22e37a3937f9f327a6a
@ZachNagengast The vae_encoder, vae_decoder, and unet were converted. But probably during converting text-encoder, Mac crashed and auto-restarted. I'm not sure but it may be due to the beta version of macOS, or it may be due to lack of memory. Because my Mac mini/M2 Pro has 16GB ram and according to the Activity Monitor, the conversion process used 30GB memory. Does your latest commit about text-encoder affect this?
Yes, there was an error for base
models that got added when I added refiner support. If the unet completed successfully then I wouldn't expect the text encoders to cause memory problems.
@ZachNagengast GM, macOS crashed with the latest commit, which fixed text_encoder checks, when starting text-encoder conversion. It may be due to a lack of memory. (Your 32GB Mac doesn't have this issue.) Allowing the scaled_dot_product_attention to either 6 or 7 inputs works fine.
used commit:
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████| 11/11 [00:00<00:00, 36.71 passes/s] INFO:main:Saved unet model to CoreMLModels/Stable_Diffusion_version_stabilityai_stable-diffusion-xl-base-0.9_unet.mlpackage => and after a while, crashed
When converting text-encoder, the memory pressure raised and macOS crashed. With compute-units ALL and CPU_AND_GPU, the results are same. (BTW, SDXL 1.0 is out.)
You might need more RAM to do it, or else perhaps the fp16 version, but I have uploaded the converted models of v1.0 to hugging face now: https://huggingface.co/ZachNagengast/coreml-stable-diffusion-xl-base-v1.0
I see. I should have chosen the 32GB ram. Normally when there is a memory problem the python execution is killed by system, but this time the OS is crashing. It may be caused by macOS beta. Wow, so you've already provided the CoreML model for SDXL 1.0. thank you. I'll try it.
Occasionally the conversion of SDXL to CoreML is completed on a Mac mini/M2 Pro/16GB ram. The SDXL inference in Swift (PR #218) starts running but macOS crashed after a while. 32GB ram seems to be needed. Thanks, anyway.
I think it also depends on how much swap memory you have available, I've seen the process take up to 70gb of ram even though I only have 32gb installed.
Could we hook this work to the one done in ComfyUI? https://github.com/comfyanonymous/ComfyUI/issues/843
@Any-Winter-4079 You can run CoreML models with python no problem on a mac, I think its a good idea 👍
SDXL has been supported. Thank you.
@ynagatomo Thanks for trying it out, I was curious if anyone else would have this issue. On my machine I was getting errors saying got 7 inputs(s), expected [6] so I added an override for
scaled_dot_product_attention
. What you can do to get ride of this error since your machine doesn't have it, just comment out the linemodify_coremltools_torch_frontend_scaled_dot_product_attention()
here and here. I'll try to get to the bottom of the issue.Update: I'm not sure why I was getting 7 inputs still, but I modified the
_get_inputs
to use the min_expected option and dropping any inputs after 6
Same error just occurred
There is a PR that I think fixes this issue here: https://github.com/apple/coremltools/pull/2021
@l1xiangyi - the PR mentioned by @jrittvo should solve your issue. In the mean time you could just also use torch==2.0.0
, that should also solve your problem.
You might need more RAM to do it, or else perhaps the fp16 version, but I have uploaded the converted models of v1.0 to hugging face now: https://huggingface.co/ZachNagengast/coreml-stable-diffusion-xl-base-v1.0
I am unable to access the coreml models
There are converted versions here: https://huggingface.co/apple/coreml-stable-diffusion-xl-base and here: https://huggingface.co/coreml-community/coreml-SDXL-v10-Base-Refiner
Thank you for sharing the great project. When converting SDXL 0.9, an error occurs. Can I solve this?
Command: % python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-unet --convert-safety-checker --model-version stabilityai/stable-diffusion-xl-base-0.9 --compute-unit CPU_AND_NE --attention-implementation SPLIT_EINSUM_V2 -o SDXL09Model --chunk-unet --bundle-resources-for-swift-cli
Result:
INFO:main:Converting unet Traceback (most recent call last): File "/Users/ynaga/miniforge3/envs/appleml100_38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/ynaga/miniforge3/envs/appleml100_38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/ynaga/MyFiles/SWStudy/AppleMLSDv100/SD21model/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1340, in
main(args)
File "/Users/ynaga/MyFiles/SWStudy/AppleMLSDv100/SD21model/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1192, in main
convert_unet(pipe, args)
File "/Users/ynaga/MyFiles/SWStudy/AppleMLSDv100/SD21model/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 666, in convert_unet
load_state_dict_summary = reference_unet.load_state_dict(
File "/Users/ynaga/miniforge3/envs/appleml100_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
Missing key(s) in state_dict: "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias".
Unexpected key(s) in state_dict: "add_embedding.linear_1.weight", "add_embedding.linear_1.bias",
:
:
, "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias".