Amblyopius / Stable-Diffusion-ONNX-FP16

Example code and documentation on how to get Stable Diffusion running with ONNX FP16 models on DirectML. Can run accelerated on all DirectML supported cards including AMD and Intel.
GNU General Public License v3.0
288 stars 45 forks source link

Generated image is always a completely black canvas #31

Closed com-network closed 1 year ago

com-network commented 1 year ago

I'm trying to use the tool (CPU only) and I'm following the guide step-by-step. However the generated image is always a completely black canvas.

First I downloaded the model and converted it to ONNX models:

Conversion log ``` (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python conv_sd_to_onnx.py --model_path "d:/stable-diffusion-2-1-base" --output_path "d:/stable-diffusion-2-1-base_onnx" --fp16 c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\symbolic_opset9.py:5742: UserWarning: Exporting aten::index operator of advanced indexing in opset 15 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn( ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\unet_2d_condition.py:650: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:127: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:140: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\unet_2d_condition.py:793: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\autoencoder_kl.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\utils.py:691: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.) _C._jit_pass_onnx_graph_shape_type_inference( c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\utils.py:1198: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.) _C._jit_pass_onnx_graph_shape_type_inference( ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\autoencoder_kl.py:193: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== ONNX pipeline saved to c:\stable-diffusion-2-1-base_onnx 2023-05-17 17:14:46.0759019 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:14:47.0292714 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:14:47.0359445 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 17:14:49.3462450 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:14:49.4506139 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:14:49.4578436 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 17:14:50.3127437 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:14:50.8373542 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:14:50.8446672 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 17:14:55.1853004 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:14:55.2730131 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:14:55.2808058 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. ONNX pipeline is loadable ```

Then the generation process

Process log ``` (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python test-txt2img.py --model "d:\stable-diffusion-2-1-base_onnx2" --size 256 --seed 0 2023-05-17 17:18:35.1594695 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:18:35.2571994 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:18:35.2638711 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 17:18:35.5660742 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:18:35.6494087 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:18:35.6565771 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 17:18:36.1736922 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:18:36.6198002 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:18:36.6280828 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 17:18:38.2551350 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 17:18:39.0760730 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 17:18:39.0841299 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [01:27<00:00, 2.82s/it] c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\utils\pil_utils.py:38: RuntimeWarning: invalid value encountered in cast images = (images * 255).round().astype("uint8") ```
My list of installed Python packages ``` (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>pip list Package Version ---------------------- --------------------- accelerate 0.19.0 aiofiles 23.1.0 aiohttp 3.8.4 aiosignal 1.3.1 altair 5.0.0 antlr4-python3-runtime 4.9.3 anyio 3.6.2 async-timeout 4.0.2 attrs 23.1.0 blis 0.7.9 catalogue 2.0.8 certifi 2023.5.7 charset-normalizer 3.1.0 click 8.1.3 colorama 0.4.6 coloredlogs 15.0.1 confection 0.0.4 contourpy 1.0.7 cycler 0.11.0 cymem 2.0.7 diffusers 0.16.1 fastapi 0.95.1 ffmpy 0.3.0 filelock 3.12.0 flatbuffers 23.5.9 fonttools 4.39.4 frozenlist 1.3.3 fsspec 2023.5.0 ftfy 6.1.1 gradio 3.30.0 gradio_client 0.2.4 h11 0.14.0 httpcore 0.17.0 httpx 0.24.0 huggingface-hub 0.14.1 humanfriendly 10.0 idna 3.4 importlib-metadata 6.6.0 Jinja2 3.1.2 jsonschema 4.17.3 kiwisolver 1.4.4 langcodes 3.3.0 linkify-it-py 2.0.2 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.7.1 mdit-py-plugins 0.3.3 mdurl 0.1.2 mpmath 1.3.0 multidict 6.0.4 murmurhash 1.0.9 networkx 3.1 numpy 1.24.3 omegaconf 2.3.0 onnx 1.14.0 onnxconverter-common 1.13.0 onnxruntime-directml 1.14.1 opencv-python 4.7.0.72 orjson 3.8.12 packaging 23.1 pandas 2.0.1 pathy 0.10.1 Pillow 9.5.0 pip 23.1.2 preshed 3.0.8 protobuf 4.23.0 psutil 5.9.5 pydantic 1.10.7 pydub 0.25.1 Pygments 2.15.1 pyparsing 3.0.9 pyreadline3 3.4.1 pyrsistent 0.19.3 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.3 PyYAML 6.0 regex 2023.5.5 requests 2.30.0 safetensors 0.3.1 scipy 1.10.1 semantic-version 2.10.0 setuptools 63.2.0 six 1.16.0 smart-open 6.3.0 sniffio 1.3.0 spacy 3.5.2 spacy-legacy 3.0.12 spacy-loggers 1.0.4 srsly 2.4.6 starlette 0.26.1 sympy 1.12 thinc 8.1.10 tokenizers 0.13.3 toolz 0.12.0 torch 2.1.0.dev20230514+cpu tqdm 4.65.0 transformers 4.29.1 typer 0.7.0 typing_extensions 4.5.0 tzdata 2023.3 uc-micro-py 1.0.2 urllib3 2.0.2 uvicorn 0.22.0 wasabi 1.1.1 wcwidth 0.2.6 websockets 11.0.3 yarl 1.9.2 zipp 3.15.0 ```

Can anyone tell me what have I done wrong here please?

Amblyopius commented 1 year ago

Hi,

One potential issue is that you are trying to generate with a size of 256x256. It'll be a lot slower but you'd have to see if you get the same error when trying to do 512x512

com-network commented 1 year ago

Hi, I'm running this on my very old computer with only integrated GPU (Intel Iris Plus Graphics 640). My computer ran out of memory when I tried to generate image with size of 512x512 using fp16 model or even when converting a diffusion model to fp32 mode.

If I use pytorch with the original model, my computer can still generate the image just that the process takes very long. This only happened with ONNX.

So is there anything else that I can do for my case?

Process log ``` (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python test-txt2img.py --model "d:/stable-diffusion-2-1-base_onnx2" --size 512 --seed 0 2023-05-17 22:09:12.3060518 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 22:09:12.7540838 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 22:09:12.7601406 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 22:09:15.6211040 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 22:09:15.7131766 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 22:09:15.7197030 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 22:09:21.0718046 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 22:09:21.1620580 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 22:09:21.1702179 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 22:09:42.6752184 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-17 22:09:43.5292144 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-17 22:09:43.5366353 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-17 22:09:45.1198332 [E:onnxruntime:, inference_session.cc:1533 onnxruntime::InferenceSession::Initialize::::operator ()] Exception during initialization: c:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime_pybind11_state.pyd!00007FFB874DE1B1: (caller: 00007FFB874DDF52) Exception(2) tid(b6c) 8007000E Not enough memory resources are available to complete this operation. Traceback (most recent call last): File "c:\Program Files\Stable-Diffusion-ONNX-FP16\test-txt2img.py", line 129, in pipe = OnnxStableDiffusionPipeline.from_pretrained(args.model, File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1039, in from_pretrained loaded_sub_model = load_sub_model( File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 445, in load_sub_model loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs) File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 205, in from_pretrained return cls._from_pretrained( File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 172, in _from_pretrained model = OnnxRuntimeModel.load_model( File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 77, in load_model return ort.InferenceSession(path, providers=[provider], sess_options=sess_options) File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 360, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 408, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: c:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(827)\onnxruntime_pybind11_state.pyd!00007FFB874DE1B1: (caller: 00007FFB874DDF52) Exception(2) tid(b6c) 8007000E Not enough memory resources are available to complete this operation. ```
Amblyopius commented 1 year ago

ONNX on CPU isn't going to be much better than torch on CPU, you could try to use less VRAM though.

Convert with: python conv_sd_to_onnx.py --model_path "stabilityai/stable-diffusion-2-1-base" --output_path "./model/sd2_1base-fp16-maxslicing" --fp16 --attention-slicing max

And then test with: python test-txt2img.py --model "model\sd2_1base-fp16-maxslicing" --size 512 --seed 0 --cpu-textenc --cpuvae

What this does:

com-network commented 1 year ago

Thank you for the guidance. I did the conversion. Then I tried to generate 512x512 image but the CPU couldn't bear the load and the whole system went down due to high temperature. Then I tried to generate 256x256 image but all I got was some random noise similar to this.

So I guess there is nothing more to be done in this case, am I right?

Conversion log ``` (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python conv_sd_to_onnx.py --model_path "c:/stable-diffusion-2-1-base" --output_path "c:/stable-diffusion-2-1-base_onnx2_fp16_2" --fp16 --attention-slicing max WARNING: attention_slicing max implies --notune c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\symbolic_opset9.py:5742: UserWarning: Exporting aten::index operator of advanced indexing in opset 15 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn( ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\unet_2d_condition.py:650: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]): c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:127: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\resnet.py:140: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\unet_2d_condition.py:793: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\autoencoder_kl.py:168: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\_internal\jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\utils.py:691: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.) _C._jit_pass_onnx_graph_shape_type_inference( c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\torch\onnx\utils.py:1198: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at ..\torch\csrc\jit\passes\onnx\constant_fold.cpp:181.) _C._jit_pass_onnx_graph_shape_type_inference( ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\models\autoencoder_kl.py:193: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: ======== Diagnostic Run torch.onnx.export version 2.1.0.dev20230514+cpu ======== verbose: False, log level: Level.ERROR ======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ======================== ONNX pipeline saved to c:\stable-diffusion-2-1-base_onnx2_fp16_2 2023-05-18 01:08:10.5904232 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-18 01:08:10.7283196 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-18 01:08:10.7356142 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-18 01:08:13.6457094 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-18 01:09:14.3149826 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-18 01:09:14.3228648 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-18 01:10:13.8620230 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-18 01:10:14.5491577 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-18 01:10:14.5571012 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2023-05-18 01:10:15.6615388 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-18 01:10:15.7914499 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-18 01:10:15.8014417 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. ONNX pipeline is loadable ```
Process log ``` (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python test-txt2img.py --model "c:/stable-diffusion-2-1-base_onnx2_fp16_2" --size 512 --seed 0 --cpu-textenc --cpuvae 2023-05-18 02:09:02.8381972 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-18 02:09:53.7348238 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-18 02:09:53.7432381 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 23%|██████████████████▋ | 7/31 [04:08<14:12, 35.52s/it] Traceback (most recent call last): File "c:\Program Files\Stable-Diffusion-ONNX-FP16\test-txt2img.py", line 131, in image = pipe(prompt, width, height, num_inference_steps, guidance_scale, File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 409, in __call__ noise_pred = self.unet(sample=latent_model_input, timestep=timestep, encoder_hidden_states=prompt_embeds) File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 60, in __call__ return self.model.run(None, inputs) File "c:\Program Files\Stable-Diffusion-ONNX-FP16\sd_env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run return self._sess.run(output_names, input_feed, run_options) KeyboardInterrupt ^C (sd_env) c:\Program Files\Stable-Diffusion-ONNX-FP16>python test-txt2img.py --model "c:/stable-diffusion-2-1-base_onnx2_fp16_2" --size 256 --seed 0 --cpu-textenc --cpuvae 2023-05-18 02:15:38.9763892 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider. 2023-05-18 02:16:30.0826798 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2023-05-18 02:16:30.0897239 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [01:47<00:00, 3.46s/it] ```
Amblyopius commented 1 year ago

I'm afraid the Iris GPU just doesn't cut it. I'll see if I can find the time and a device to try it out. There may be a way using 2023 OpenVINO (generally not well documented). If that bears any fruit I'll see if it makes sense to expand the ONNX Runtime angle beyond DirectML or I might have to put it in a separate project. It probably would still require at least 8GB RAM and more likely 16 (but that's still a lot of computers that currently can only run on CPU).

If you're limited to CPU, you can just use torch CPU but note that it'll be slow regardless.