Closed ghost closed 5 months ago
Stable fast is faster - its closer to TensorRT speeds actually
https://github.com/chengzeyi/stable-fast#performance-comparison
I believe quantizationis already there (?) https://github.com/chengzeyi/stable-fast#model-quantization
@dsingal0 Stable fast should be faster, and deliver higher generation quality. @SuperSecureHuman Quantization is partially supported in stable-fast, but is not really efficient in speed, unfortuantely. To make it effeicient, some CUDA kernels must be carefully written.
https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion They are also using torch.compile with full-graph and some other optimizations. I imagine their compile is much slower, but is their optimized pipeline faster or slower than stable-fast? Can we get bfloat16 and quantization/fusion supported?