bes-dev / stable_diffusion.openvino

Apache License 2.0
1.53k stars 207 forks source link

Has anyone figured out how to change resolution? #101

Open ClashSAN opened 1 year ago

ClashSAN commented 1 year ago

openvino still runs faster than diffusers-onnxruntime cpu, but changing size and IR format model conversion from onnx still is not understood.

We can add DDIM support and produce good results at 8-14 steps, since this outdated diffusers has ddim support. We could also maybe hack in DPM++ 2M too. for even lower steps.

Here are the enhancements: https://github.com/ClashSAN/portable_stable_diffusion.openvino

RedAndr commented 1 year ago

The latest diffusers version has Euler Ancestral. It's quite easy to incorporate it, it just needs some conversion code. It works even faster than DDIM and gives better results most of the time.

Seegee commented 1 year ago

The latest diffusers version has Euler Ancestral. It's quite easy to incorporate it, it just needs some conversion code. It works even faster than DDIM and gives better results most of the time.

How would you go about using Euler with this fork? Is using a different sampling method like Euler the only way to change resolution?

ClashSAN commented 1 year ago

@Seegee nono, it's unrelated to resolution.

It's been mentioned in other issues here the size is being hard-coded somewhere but I don't know enough to make the changes. edit: actually i think bes-dev mentioned it was a limitation of the converted onnx unet (i think)

as for these samplers, hopefully people will make the code work with the new diffusers version

RedAndr commented 1 year ago

Actually, it's possible to change the resolution. But you would have to convert the model to another one. Or if you need flexible resolution, you can add dynamic axes during conversion. Although it makes the model much slower, about three times.

ClashSAN commented 1 year ago

@RedAndr are you referring to https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb converted model + inference being slower than the one from this repository? So maybe cpu onnx is still a better option?

RedAndr commented 1 year ago

@ClashSAN No, sorry, I meant the model converted with dynamic axes is slower. torch.onnx.export(dynamic_axes={"init_image": {0: "batch", 1: "channels", 2: "height", 3: "width"}}) BTW, ONNX is also slower than OpenVINO IR.

ClashSAN commented 1 year ago

I didn't get https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb working yet on colab or locally but once I do I'll see if I can upload some of custom models to huggingface

RedAndr commented 1 year ago

Locally it works Ok, colab should too, but didn't try.

ClashSAN commented 1 year ago

@RedAndr got it working, it used a ton of ram to convert the unet. My 8gb+24swap couldn't convert this.

https://huggingface.co/ClashSAN/openjourney-openvino-IR/tree/main

when using different sizes, did the ram usage drop? enough to fit 8gb?

in my environment, i go from 16s/it (onnx) -> 13s/it (openvino)

same speeds as this repo

RedAndr commented 1 year ago

Very good! Yes, memory consumption depends on the resolution pretty much. But even with 128GB of RAM, I couldn't convert a model larger than 1024x768. I don't have a formula, you need to check yourself, which resolution could fit into your amount or memory.

ClashSAN commented 1 year ago

The ncnn repo is currently a tad bit slower (pc) 5.3s/it vs 5.6s/it but does dynamic shapes properly. It use 11.5gb of ram peak for me.

Then there's another https://github.com/fengwang/Stable-Diffusion-NCNN (less ram) I haven't tested.

@RedAndr ever thought about making a repo to guide users to use this? Openvino will provide speed improvements for intel hardware, and for users under 8gb, they can use onnx models, which my converted f16 model only takes an additional 2.5gb for 320x384 (good for phone users.)

Then, you can further quantize the model to get 2x speed. At 256x320, it takes only .8gb.

RedAndr commented 1 year ago

@ClashSAN Ok, here's my version of the converter: https://github.com/RedAndr/SD_PyTorch2ONNX It has dynamic axes, so any resolution could be used. To have a constant resolution just comment lines with dynamic_axes. Let me know if you have questions.