Stable-Diffusion implemented by ncnn framework based on C++, supported txt2img and img2img!
Zhihu: https://zhuanlan.zhihu.com/p/582552276
Video: https://www.bilibili.com/video/BV15g411x7Hc
txt2img Performance (time pre-it and ram) | per-it | i7-12700 (512x512) | i7-12700 (256x256) | Snapdragon865 (256x256) |
---|---|---|---|---|
slow | 4.85s/5.24G(7.07G) | 1.05s/3.58G(4.02G) | 1.6s/2.2G(2.6G) | |
fast | 2.85s/9.47G(11.29G) | 0.65s/5.76G(6.20G) |
2023-03-11: happy to add img2img android and release new apk
2023-03-10: happy to add img2img x86
2023-01-19: speed up & less ram in x86, dynamic shape in x86
2023-01-12: update to the latest ncnn code and use optimize model, update android, add memory monitor
2023-01-05: add 256x256 model to x86 project
2023-01-04: merge and finish the mha op in x86, enable fast gelu
All models and exe file you can download from 百度网盘 or Google Drive or Release
If you only need ncnn model, you can search it from 硬件模型库-设备专用模型, it would be more faster and free.
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin, AutoencoderKL-encoder-512-512-fp16.bin
and put them to assets
foldermagic.txt
, each line are:
stable-diffusion.exe
Note: Please comply with the requirements of the SD model and do not use it for illegal purposes
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin, AutoencoderKL-encoder-512-512-fp16.bin
and put them to assets
foldercd x86/linux
mkdir -p build && cd build
cmake ..
make -j$(nproc)
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them to build/assets
folder./stable-diffusion-ncnn
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them to assets
folderI've uploaded the three onnx models used by Stable-Diffusion, so that you can do some interesting work.
You can find them from the link above.
ncnn (input & output): token, multiplier, cond, conds
onnx (input & output): onnx::Reshape_0, 2271
z = onnx(onnx::Reshape_0=token)
origin_mean = z.mean()
z *= multiplier
new_mean = z.mean()
z *= origin_mean / new_mean
conds = torch.concat([cond,z], dim=-2)
ncnn (input & output): in0, in1, in2, c_in, c_out, outout
onnx (input & output): x, t, cc, out
outout = in0 + onnx(x=in0 * c_in, t=in1, cc=in2) * c_out