feat: Automatic attention slicing when slice size is set to 0

sagar-a16z commented 1 year ago

Add support for automatic attention slicing based on the huggingface diffusers implementation https://github.com/huggingface/diffusers/blob/91925fbb761d944d54271660c4c3cffee55798fa/examples/community/stable_diffusion_mega.py#L96-L113

LaurentMazare commented 1 year ago

Looks good, thanks! Could you just give a bit more details about the testing that you did? (typically if you can generate images with/without the features and check that they compare well, that's great)

sagar-a16z commented 1 year ago

I tested it with the following command:

cargo run --target=aarch64-apple-darwin --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch." --sliced-attention-size 0 --cpu unet --cpu clip

Here's the output:

Here's the outuput without sliced-attention-size 0

cargo run --target=aarch64-apple-darwin --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch." --cpu unet --cpu clip

output:

The performance difference on my m1 mac is staggering...The first run finished in less then 2 minutes. Without attention slicing it takes over 10 minutes.

LaurentMazare commented 1 year ago

Neat, that's some impressive speedup! Thanks for the PR!

LaurentMazare / diffusers-rs

feat: Automatic attention slicing when slice size is set to 0 #51