[Feature Request]: attention slicing

NightMachinery commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

HuggingFace recommends using attention slicing on Apple Silicon (M1, M2). Is this supported in AUTOMATIC1111? Can it be added?

M1/M2 performance is very sensitive to memory pressure. The system will automatically swap if it needs to, but performance will degrade significantly when it does.

We recommend you use attention slicing to reduce memory pressure during inference and prevent swapping, particularly if your computer has lass than 64 GB of system RAM, or if you generate images at non-standard resolutions larger than 512 × 512 pixels. Attention slicing performs the costly attention operation in multiple steps instead of all at once. It usually has a performance impact of ~20% in computers without universal memory, but we have observed better performance in most Apple Silicon computers, unless you have 64 GB or more.
pipeline.enable_attention_slicing()

Proposed workflow

_

Additional information

No response

JackCopland commented 1 year ago

The notes on this page for stable diffusion 2 also recommend enabling this for low memory setups. The change from 512x512 to 768x768 means more people will be hitting memory limits. This could perhaps be added to the medvram and lowvram arguments?

"If you have low GPU RAM available, make sure to add a pipe.enable_attention_slicing() after sending it to cuda for less VRAM usage (to the cost of speed)"

axemaster commented 1 year ago

+1

larspohlmann commented 7 months ago

Anyone working on this?

AUTOMATIC1111 / stable-diffusion-webui