RuntimeError: The size of tensor a (64) must match the size of tensor b (96) at non-singleton dimension 3
Tests runs:
without --unet-support-controlnet , @ 512x512 -- OK
without --unet-support-controlnet , @ 768x768 -- OK
with --unet-support-controlnet , @ 512x512 -- OK
with --unet-support-controlnet , @ 768x768 -- FAIL
Appears to complete
Stable_Diffusion_version_diffusers_vae_decoder.mlpackage
Stable_Diffusion_version_diffusers_vae-encoder.mlpackage
Errors when starting
Stable_Diffusion_version_diffusers_control-unet.mlpackage
INFO:main:Initializing StableDiffusionPipeline with ./diffusers..
/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["id2label"] will be overriden.
INFO:main:Done.
INFO:main:Attention implementation in effect: AttentionImplementations.ORIGINAL
INFO:main:Converting vae_decoder
/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:127: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:140: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if hidden_states.shape[0] >= 64:
INFO:main:Converting vae_decoder to CoreML..
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/426 [00:00<?, ? ops/s]WARNING:main:Casted the beta(value=0.0) argument of baddbmm op from int32 to float32 dtype for conversion!
Converting PyTorch Frontend ==> MIL Ops: 100%|▉| 425/426 [00:00<00:00, 2270.48 o
Running MIL frontend_pytorch pipeline: 100%|█| 5/5 [00:00<00:00, 330.30 passes/s
Running MIL default pipeline: 100%|████████| 57/57 [00:03<00:00, 17.60 passes/s]
Running MIL backend_mlprogram pipeline: 100%|█| 10/10 [00:00<00:00, 671.22 passe
INFO:main:Saved vae_decoder model to ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_decoder.mlpackage
INFO:main:Saved vae_decoder into ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_decoder.mlpackage
INFO:main:Converted vae_decoder
INFO:main:Converting vae_encoder
/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
INFO:main:Converting vae_encoder to CoreML..
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/354 [00:00<?, ? ops/s]WARNING:main:Casted the beta(value=0.0) argument of baddbmm op from int32 to float32 dtype for conversion!
Converting PyTorch Frontend ==> MIL Ops: 100%|▉| 353/354 [00:00<00:00, 2195.42 o
Running MIL frontend_pytorch pipeline: 100%|█| 5/5 [00:00<00:00, 455.95 passes/s
Running MIL default pipeline: 100%|████████| 57/57 [00:02<00:00, 28.16 passes/s]
Running MIL backend_mlprogram pipeline: 100%|█| 10/10 [00:00<00:00, 869.65 passe
INFO:main:Saved vae_encoder model to ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_encoder.mlpackage
INFO:main:Saved vae_encoder into ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_encoder.mlpackage
INFO:main:Converted vae_encoder
INFO:main:Converting unet
INFO:main:Sample UNet inputs spec: {'sample': (torch.Size([2, 4, 96, 96]), torch.float32), 'timestep': (torch.Size([2]), torch.float32), 'encoder_hidden_states': (torch.Size([2, 768, 1, 77]), torch.float32), 'additional_residual_0': (torch.Size([2, 320, 64, 64]), torch.float32), 'additional_residual_1': (torch.Size([2, 320, 64, 64]), torch.float32), 'additional_residual_2': (torch.Size([2, 320, 64, 64]), torch.float32), 'additional_residual_3': (torch.Size([2, 320, 32, 32]), torch.float32), 'additional_residual_4': (torch.Size([2, 640, 32, 32]), torch.float32), 'additional_residual_5': (torch.Size([2, 640, 32, 32]), torch.float32), 'additional_residual_6': (torch.Size([2, 640, 16, 16]), torch.float32), 'additional_residual_7': (torch.Size([2, 1280, 16, 16]), torch.float32), 'additional_residual_8': (torch.Size([2, 1280, 16, 16]), torch.float32), 'additional_residual_9': (torch.Size([2, 1280, 8, 8]), torch.float32), 'additional_residual_10': (torch.Size([2, 1280, 8, 8]), torch.float32), 'additional_residual_11': (torch.Size([2, 1280, 8, 8]), torch.float32), 'additional_residual_12': (torch.Size([2, 1280, 8, 8]), torch.float32)}
INFO:main:JIT tracing..
/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/layer_norm.py:61: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert inputs.size(1) == self.num_channels
Traceback (most recent call last):
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/torch2coreml.py", line 1282, in
main(args)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/torch2coreml.py", line 1147, in main
convert_unet(pipe, args)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/torch2coreml.py", line 688, in convert_unet
reference_unet = torch.jit.trace(reference_unet,
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/jit/_trace.py", line 794, in trace
return trace_module(
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/jit/_trace.py", line 1056, in trace_module
module._c._create_method_from_trace(
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(input, **kwargs)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/unet.py", line 972, in forward
down_block_res_sample = down_block_res_sample + additional_residuals[i]
RuntimeError: The size of tensor a (96) must match the size of tensor b (64) at non-singleton dimension 3
Thanks for the report @jrittvo! I pushed a fix for this issue with ControlNet and custom latent dimensions. Please feel free to open a new issue if this issue persists for you or a related issue appears.
I am trying to convert the basic Stable Diffusion v1.5 model downloaded from https://huggingface.co/runwayml/stable-diffusion-v1-5 from diffusers format to coreml original 768x768 for use with ControlNet.
My command line is:
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-encoder --convert-vae-decoder --unet-support-controlnet --model-version "./diffusers" --bundle-resources-for-swift-cli --attention-implementation ORIGINAL --latent-h 96 --latent-w 96 --compute-unit CPU_AND_GPU -o "./SD15-Original-768x768"
The error message is:
RuntimeError: The size of tensor a (64) must match the size of tensor b (96) at non-singleton dimension 3
Tests runs:
without --unet-support-controlnet , @ 512x512 -- OK without --unet-support-controlnet , @ 768x768 -- OK with --unet-support-controlnet , @ 512x512 -- OK with --unet-support-controlnet , @ 768x768 -- FAIL
Pipeline 1 uses: coremltools 6.2 diffusers 0.14.0 python 3.8
Pipeline 2 uses: coremltools 6.3 diffusers 0.15.1 python 3.10
Behavior is identical in both pipelines.
Appears to complete Stable_Diffusion_version_diffusers_vae_decoder.mlpackage Stable_Diffusion_version_diffusers_vae-encoder.mlpackage Errors when starting Stable_Diffusion_version_diffusers_control-unet.mlpackage
Full Terminal output:
(python_playground) jrittvo@M1PRO convert % python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-encoder --convert-vae-decoder --unet-support-controlnet --model-version "./diffusers" --bundle-resources-for-swift-cli --attention-implementation ORIGINAL --latent-h 96 --latent-w 96 --compute-unit CPU_AND_GPU -o "./SD15-Original-768x768"
INFO:main:Initializing StableDiffusionPipeline with ./diffusers.. /Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead. warnings.warn(
text_config_dict
is provided which will be used to initializeCLIPTextConfig
. The valuetext_config["id2label"]
will be overriden. INFO:main:Done. INFO:main:Attention implementation in effect: AttentionImplementations.ORIGINALINFO:main:Converting vae_decoder /Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:127: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:140: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: INFO:main:Converting vae_decoder to CoreML.. Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/426 [00:00<?, ? ops/s]WARNING:main:Casted the
beta
(value=0.0) argument ofbaddbmm
op from int32 to float32 dtype for conversion! Converting PyTorch Frontend ==> MIL Ops: 100%|▉| 425/426 [00:00<00:00, 2270.48 o Running MIL frontend_pytorch pipeline: 100%|█| 5/5 [00:00<00:00, 330.30 passes/s Running MIL default pipeline: 100%|████████| 57/57 [00:03<00:00, 17.60 passes/s] Running MIL backend_mlprogram pipeline: 100%|█| 10/10 [00:00<00:00, 671.22 passe INFO:main:Saved vae_decoder model to ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_decoder.mlpackage INFO:main:Saved vae_decoder into ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_decoder.mlpackage INFO:main:Converted vae_decoderINFO:main:Converting vae_encoder /Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/diffusers/models/resnet.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels INFO:main:Converting vae_encoder to CoreML.. Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/354 [00:00<?, ? ops/s]WARNING:main:Casted the
beta
(value=0.0) argument ofbaddbmm
op from int32 to float32 dtype for conversion! Converting PyTorch Frontend ==> MIL Ops: 100%|▉| 353/354 [00:00<00:00, 2195.42 o Running MIL frontend_pytorch pipeline: 100%|█| 5/5 [00:00<00:00, 455.95 passes/s Running MIL default pipeline: 100%|████████| 57/57 [00:02<00:00, 28.16 passes/s] Running MIL backend_mlprogram pipeline: 100%|█| 10/10 [00:00<00:00, 869.65 passe INFO:main:Saved vae_encoder model to ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_encoder.mlpackage INFO:main:Saved vae_encoder into ./SD15-Original-768x768/Stable_Diffusionversion._diffusers_vae_encoder.mlpackage INFO:main:Converted vae_encoderINFO:main:Converting unet INFO:main:Sample UNet inputs spec: {'sample': (torch.Size([2, 4, 96, 96]), torch.float32), 'timestep': (torch.Size([2]), torch.float32), 'encoder_hidden_states': (torch.Size([2, 768, 1, 77]), torch.float32), 'additional_residual_0': (torch.Size([2, 320, 64, 64]), torch.float32), 'additional_residual_1': (torch.Size([2, 320, 64, 64]), torch.float32), 'additional_residual_2': (torch.Size([2, 320, 64, 64]), torch.float32), 'additional_residual_3': (torch.Size([2, 320, 32, 32]), torch.float32), 'additional_residual_4': (torch.Size([2, 640, 32, 32]), torch.float32), 'additional_residual_5': (torch.Size([2, 640, 32, 32]), torch.float32), 'additional_residual_6': (torch.Size([2, 640, 16, 16]), torch.float32), 'additional_residual_7': (torch.Size([2, 1280, 16, 16]), torch.float32), 'additional_residual_8': (torch.Size([2, 1280, 16, 16]), torch.float32), 'additional_residual_9': (torch.Size([2, 1280, 8, 8]), torch.float32), 'additional_residual_10': (torch.Size([2, 1280, 8, 8]), torch.float32), 'additional_residual_11': (torch.Size([2, 1280, 8, 8]), torch.float32), 'additional_residual_12': (torch.Size([2, 1280, 8, 8]), torch.float32)} INFO:main:JIT tracing.. /Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/layer_norm.py:61: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert inputs.size(1) == self.num_channels
Traceback (most recent call last): File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/torch2coreml.py", line 1282, in
main(args)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/torch2coreml.py", line 1147, in main
convert_unet(pipe, args)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/torch2coreml.py", line 688, in convert_unet
reference_unet = torch.jit.trace(reference_unet,
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/jit/_trace.py", line 794, in trace
return trace_module(
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/jit/_trace.py", line 1056, in trace_module
module._c._create_method_from_trace(
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(input, **kwargs)
File "/Users/jrittvo/miniconda3/envs/python_playground/lib/python3.10/site-packages/python_coreml_stable_diffusion/unet.py", line 972, in forward down_block_res_sample = down_block_res_sample + additional_residuals[i]
RuntimeError: The size of tensor a (96) must match the size of tensor b (64) at non-singleton dimension 3