Open lucasjinreal opened 10 months ago
@lucasjinreal Just don't have enough time now. But I provide a general script to test any SD model. So anyone who is interested can test on his own.
examples/optimize_stable_diffusion_pipeline.py
Ubuntu 22.04.3 LTS, nvidia driver 545.23.06 stable_fast-1.0.3.dev20240222+torch220cu121-cp310-cp310-manylinux2014_x86_64.whl
RTX 3060 12GB, ComfyUI, batch size 4, image size 832x1216 startup flags: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 python main.py --listen 0.0.0.0 --port 7860 --bf16-vae --force-fp16 --dont-upcast-attention --preview-method auto --disable-cuda-malloc
without stable fast: 2.78s/it stable fast cuda graph enabled: 1.85s/it
Testing with RTX 4060 ti 16GB (everything else same)
without stable fast: 1.86s/it stable fast cuda graph enabled: 1.28s/it
SDXL inference speed up compare?