When converting Unet of SDXL 0.9, an error occurs.

ynagatomo commented 1 year ago

Thank you for sharing the great project. When converting SDXL 0.9, an error occurs. Can I solve this?

apple/ml-stable-diffusion v1.0.0
maxOS 14 beta, Xcode 15 beta 3, Mac mini M2 Pro
StabilityAI SDXL 0.9 at HF: https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9

Command: % python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-unet --convert-safety-checker --model-version stabilityai/stable-diffusion-xl-base-0.9 --compute-unit CPU_AND_NE --attention-implementation SPLIT_EINSUM_V2 -o SDXL09Model --chunk-unet --bundle-resources-for-swift-cli

Result:

VAE Decoder: converted (OK)
VAE Encoder: converted (OK)
Unet: An error occurred.

INFO:main:Converting unet Traceback (most recent call last): File "/Users/ynaga/miniforge3/envs/appleml100_38/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/ynaga/miniforge3/envs/appleml100_38/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/ynaga/MyFiles/SWStudy/AppleMLSDv100/SD21model/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1340, in main(args) File "/Users/ynaga/MyFiles/SWStudy/AppleMLSDv100/SD21model/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 1192, in main convert_unet(pipe, args) File "/Users/ynaga/MyFiles/SWStudy/AppleMLSDv100/SD21model/ml-stable-diffusion/python_coreml_stable_diffusion/torch2coreml.py", line 666, in convert_unet load_state_dict_summary = reference_unet.load_state_dict( File "/Users/ynaga/miniforge3/envs/appleml100_38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: Missing key(s) in state_dict: "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias". Unexpected key(s) in state_dict: "add_embedding.linear_1.weight", "add_embedding.linear_1.bias", : : , "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias".

atiorh commented 1 year ago

Thanks for the report @ynagatomo! I just looked into the required changes to support the XL version (which is not yet publicly available but some people started getting "research preview" access). Looks like we will need to support dual text encoders for inference and a few changes to the architecture code for conversion. I will look into the conversion changes first but contributions are welcome for any part of it!

jrittvo commented 1 year ago

The working memory demands of these larger models during inference, coupled with the 2 pass pipeline, makes these models a prime candidate for the new quantize option, if it could be implemented in the conversion process for these model types.

ynagatomo commented 1 year ago

Thank you for your insight. Understood. There are a lot of challenges in conversion and inference, for the three times larger UNet (SD2.1: 865M => SDXL: 2.6B params), Two text encoder(CLIP ViT-L and OpenCLIP ViT-bigG), and added Pooled text embedding (OpenCLIP ViT-bigG). I'll close this issue. Thanks.

atiorh commented 1 year ago

No need to close it! I will post updates here as we gradually support XL

ynagatomo commented 1 year ago

Very glad to hear that. Thank you. :)

GuiyeC commented 1 year ago

Hello everyone, I was able to convert and execute Stable Diffusion XL in my app Guernika.

I have open sourced the code for my Guernika Model Converter, here you can see the scripts I'm using to get this working.

I have also open sourced another repository with Schedulers, a few more than in this repo and all compatible with Stable Diffusion 2.X.

godly-devotion commented 1 year ago

Hello everyone, I was able to convert and execute Stable Diffusion XL in my app Guernika.

I have open sourced the code for my Guernika Model Converter, here you can see the scripts I'm using to get this working.

I have also open sourced another repository with Schedulers, a few more than in this repo and all compatible with Stable Diffusion 2.X.

That's great! Slightly off topic but do you have plans to create a PR for the missing schedulers?

GuiyeC commented 1 year ago

@godly-devotion I will probably not be making a PR for the missing schedulers, at most it would be a PR adding that repo as a dependency and using it on this repo but that should be pretty easy to do for anyone.

ZachNagengast commented 1 year ago

https://github.com/apple/ml-stable-diffusion/pull/217 should fix these unet errors, give it a try and lmk if you have any issues.

You'll need to add --convert-text-encoder-2 and --pineline-type SDXL to your cli command, and change the attention to ORIGINAL since split einsum doesn't work very well with the architecture by default.

ynagatomo commented 1 year ago

Thank you for your PR217 on SDXL. I encountered an error shown below with your convert command, during converting vae_decoder. Any idea? Thanks in advance.

macOS 14 beta, Xcode 15 beta 5, Python 3.8, PyToarch 2.0.1
Mac mini M2 Max/16GB ram

% git clone https://github.com/ZachNagengast/ml-stable-diffusion/ % git checkout -b sdxl-base-support origin/sdxl-base-support % pip install invisible-watermark : Successfully installed PyWavelets-1.4.1 invisible-watermark-0.2.0 opencv-python-4.8.0.74 % python -m python_coreml_stable_diffusion.torch2coreml --convert-text-encoder --convert-text-encoder-2 --convert-vae-decoder --convert-vae-encoder --convert-unet --model-version stabilityai/stable-diffusion-xl-base-0.9 --compute-unit ALL --attention-implementation ORIGINAL --bundle-resources-for-swift-cli --pipeline-type SDXL -o CoreMLModels : INFO:main:Converting vae_decoder to CoreML.. Converting PyTorch Frontend ==> MIL Ops: 22%	████████	80/369 [00:00<00:00, 2252.19 ops/s]

File "/Users/ynaga/miniforge3/envs/coreml38/lib/python3.8/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 199, in _get_inputs raise ValueError( ValueError: node hidden_states.7 (scaled_dot_product_attention) got 6 input(s), expected [7]

ZachNagengast commented 1 year ago

@ynagatomo Thanks for trying it out, I was curious if anyone else would have this issue. On my machine I was getting errors saying got 7 inputs(s), expected [6] so I added an override for scaled_dot_product_attention. What you can do to get ride of this error since your machine doesn't have it, just comment out the line modify_coremltools_torch_frontend_scaled_dot_product_attention() here and here. I'll try to get to the bottom of the issue.

Update: I'm not sure why I was getting 7 inputs still, but I modified the _get_inputs to use the min_expected option and dropping any inputs after 6

ynagatomo commented 1 year ago

@ZachNagengast Thank you for your comment. I commented out two modify_coremltools_torch_frontend_scaled_dot_product_attention() and now the conversion is in progress. Have you changed your latest push to handle this automatically?

ZachNagengast commented 1 year ago

Great to hear, yes commit is here https://github.com/apple/ml-stable-diffusion/pull/217/commits/bc373b2ed4d45ac78e71c22e37a3937f9f327a6a

ynagatomo commented 1 year ago

@ZachNagengast The vae_encoder, vae_decoder, and unet were converted. But probably during converting text-encoder, Mac crashed and auto-restarted. I'm not sure but it may be due to the beta version of macOS, or it may be due to lack of memory. Because my Mac mini/M2 Pro has 16GB ram and according to the Activity Monitor, the conversion process used 30GB memory. Does your latest commit about text-encoder affect this?

ZachNagengast commented 1 year ago

Yes, there was an error for base models that got added when I added refiner support. If the unet completed successfully then I wouldn't expect the text encoders to cause memory problems.

ynagatomo commented 1 year ago

@ZachNagengast GM, macOS crashed with the latest commit, which fixed text_encoder checks, when starting text-encoder conversion. It may be due to a lack of memory. (Your 32GB Mac doesn't have this issue.) Allowing the scaled_dot_product_attention to either 6 or 7 inputs works fine.

used commit:

f4c9f69 (HEAD -> sdxl-base-support, origin/sdxl-base-support) Fix text_encoder checks

Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████| 11/11 [00:00<00:00, 36.71 passes/s] INFO:main:Saved unet model to CoreMLModels/Stable_Diffusion_version_stabilityai_stable-diffusion-xl-base-0.9_unet.mlpackage => and after a while, crashed

ynagatomo commented 1 year ago

When converting text-encoder, the memory pressure raised and macOS crashed. With compute-units ALL and CPU_AND_GPU, the results are same. (BTW, SDXL 1.0 is out.)

ZachNagengast commented 1 year ago

You might need more RAM to do it, or else perhaps the fp16 version, but I have uploaded the converted models of v1.0 to hugging face now: https://huggingface.co/ZachNagengast/coreml-stable-diffusion-xl-base-v1.0

ynagatomo commented 1 year ago

I see. I should have chosen the 32GB ram. Normally when there is a memory problem the python execution is killed by system, but this time the OS is crashing. It may be caused by macOS beta. Wow, so you've already provided the CoreML model for SDXL 1.0. thank you. I'll try it.

ynagatomo commented 1 year ago

Occasionally the conversion of SDXL to CoreML is completed on a Mac mini/M2 Pro/16GB ram. The SDXL inference in Swift (PR #218) starts running but macOS crashed after a while. 32GB ram seems to be needed. Thanks, anyway.

ZachNagengast commented 1 year ago

I think it also depends on how much swap memory you have available, I've seen the process take up to 70gb of ram even though I only have 32gb installed.

Any-Winter-4079 commented 1 year ago

Could we hook this work to the one done in ComfyUI? https://github.com/comfyanonymous/ComfyUI/issues/843

ZachNagengast commented 1 year ago

@Any-Winter-4079 You can run CoreML models with python no problem on a mac, I think its a good idea 👍

ynagatomo commented 1 year ago

SDXL has been supported. Thank you.

xdotli commented 1 year ago

@ynagatomo Thanks for trying it out, I was curious if anyone else would have this issue. On my machine I was getting errors saying got 7 inputs(s), expected [6] so I added an override for scaled_dot_product_attention. What you can do to get ride of this error since your machine doesn't have it, just comment out the line modify_coremltools_torch_frontend_scaled_dot_product_attention() here and here. I'll try to get to the bottom of the issue.

Update: I'm not sure why I was getting 7 inputs still, but I modified the _get_inputs to use the min_expected option and dropping any inputs after 6

Same error just occurred

jrittvo commented 1 year ago

There is a PR that I think fixes this issue here: https://github.com/apple/coremltools/pull/2021

TobyRoseman commented 1 year ago

@l1xiangyi - the PR mentioned by @jrittvo should solve your issue. In the mean time you could just also use torch==2.0.0, that should also solve your problem.

Zulqurnain24 commented 1 year ago

You might need more RAM to do it, or else perhaps the fp16 version, but I have uploaded the converted models of v1.0 to hugging face now: https://huggingface.co/ZachNagengast/coreml-stable-diffusion-xl-base-v1.0

I am unable to access the coreml models

jrittvo commented 1 year ago

There are converted versions here: https://huggingface.co/apple/coreml-stable-diffusion-xl-base and here: https://huggingface.co/coreml-community/coreml-SDXL-v10-Base-Refiner

apple / ml-stable-diffusion

When converting Unet of SDXL 0.9, an error occurs. #204