Closed ZeroCool940711 closed 2 years ago
I did not experience any speedup when enabling attention slicing. In fact it is a method to reduce memory usage at the cost of some performance:
There’s a small performance penalty of about 10% slower inference times, but this method allows you to use Stable Diffusion in as little as 3.2 GB of VRAM!
mmm, the performance cost is when there is not enough resources to process things normally, when there is enough VRAM available things get faster, I think they are processed in parallel sometimes, not always, if the amount of VRAM is not enough then things go into a queue to reduce the amount of VRAM used, still, this feature is something good to have as it enable people to run inference even with low VRAM GPUs
When enabling attention slicing on diffusers we can get up to 3 times the performance we have when not using it, here are the docs for attention slicing.