chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.16k stars 71 forks source link

SDXL inference speed up compare? #63

Open lucasjinreal opened 10 months ago

lucasjinreal commented 10 months ago

SDXL inference speed up compare?

chengzeyi commented 10 months ago

@lucasjinreal Just don't have enough time now. But I provide a general script to test any SD model. So anyone who is interested can test on his own.

examples/optimize_stable_diffusion_pipeline.py

jkrauss82 commented 7 months ago

Ubuntu 22.04.3 LTS, nvidia driver 545.23.06 stable_fast-1.0.3.dev20240222+torch220cu121-cp310-cp310-manylinux2014_x86_64.whl

RTX 3060 12GB, ComfyUI, batch size 4, image size 832x1216 startup flags: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 python main.py --listen 0.0.0.0 --port 7860 --bf16-vae --force-fp16 --dont-upcast-attention --preview-method auto --disable-cuda-malloc

without stable fast: 2.78s/it stable fast cuda graph enabled: 1.85s/it

jkrauss82 commented 7 months ago

Testing with RTX 4060 ti 16GB (everything else same)

without stable fast: 1.86s/it stable fast cuda graph enabled: 1.28s/it