Closed sagar-a16z closed 1 year ago
Looks good, thanks! Could you just give a bit more details about the testing that you did? (typically if you can generate images with/without the features and check that they compare well, that's great)
I tested it with the following command:
cargo run --target=aarch64-apple-darwin --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch." --sliced-attention-size 0 --cpu unet --cpu clip
Here's the output:
Here's the outuput without sliced-attention-size 0
cargo run --target=aarch64-apple-darwin --example stable-diffusion --features clap -- --prompt "A very rusty robot holding a fire torch." --cpu unet --cpu clip
output:
The performance difference on my m1 mac is staggering...The first run finished in less then 2 minutes. Without attention slicing it takes over 10 minutes.
Neat, that's some impressive speedup! Thanks for the PR!
Add support for automatic attention slicing based on the huggingface diffusers implementation https://github.com/huggingface/diffusers/blob/91925fbb761d944d54271660c4c3cffee55798fa/examples/community/stable_diffusion_mega.py#L96-L113