Improve performance 2-3x times by enabling attention slicing.

ahrm / UnstableFusion

A Stable Diffusion desktop frontend with inpainting, img2img and more!

GNU General Public License v3.0

1.26k stars 84 forks source link

Improve performance 2-3x times by enabling attention slicing. #34

Closed ZeroCool940711 closed 2 years ago

ZeroCool940711 commented 2 years ago

When enabling attention slicing on diffusers we can get up to 3 times the performance we have when not using it, here are the docs for attention slicing.

ahrm commented 2 years ago

I did not experience any speedup when enabling attention slicing. In fact it is a method to reduce memory usage at the cost of some performance:

There’s a small performance penalty of about 10% slower inference times, but this method allows you to use Stable Diffusion in as little as 3.2 GB of VRAM!

ZeroCool940711 commented 2 years ago

mmm, the performance cost is when there is not enough resources to process things normally, when there is enough VRAM available things get faster, I think they are processed in parallel sometimes, not always, if the amount of VRAM is not enough then things go into a queue to reduce the amount of VRAM used, still, this feature is something good to have as it enable people to run inference even with low VRAM GPUs