dbolya / tomesd

Speed up Stable Diffusion with this one simple trick!
MIT License
1.27k stars 78 forks source link

Support for Imagen #49

Open lsabrinax opened 1 year ago

lsabrinax commented 1 year ago

Thanks for your nice work! I want to know whether tomesd can only support stable diffusion model, can it support other diffusion model like as imagen

dbolya commented 1 year ago

I believe Imagen uses just convnets for its unet, not a transformer like stable diffusion does. So in that respect, it can't be used like I use it for stable diffusion here. However, if the underlying network has self attention modules or uses a transformer in some way, then it's possible to use it. Unsure how (or if) that would apply to Imagen, though.

lsabrinax commented 1 year ago

Thanks for your reply, I'll try it on Imagen later. And I try it on stable-diffusion first and run it on A30 GPU, when I set ratio=0.5, the time cost was 1.4->0.939(1.5x), and gpu memory was 17648MB->15576MB, the improvement is not as good as reported in Readme, and when I set ratio=0.6, the cost time and GPU memory are greater than ratio=0.5. It could be what reasons? How can I reproduce the result

dbolya commented 1 year ago

and when I set ratio=0.6, the cost time and GPU memory are greater than ratio=0.5

That doesn't seem right. What environment are you in and how are you benchmarking this?

lsabrinax commented 1 year ago

I rerun the following code on V100 GPU to evaluate the performance, torcch version is 0.12.1 ,image size is 512* 512

import torch, tomesd
from diffusers import StableDiffusionPipeline
import time

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")

# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here
infer_time, count = 0.0, 0.0
for i in range(200):
    start = time.time()
    image = pipe("a photo of an astronaut riding a horse on mars").images[0]
    infer_time += time.time() - start
    count += 1
image.save("astronaut.png")
print(f'average time: {infer_time / count}')

w/o tomesd: gpu memory is 6040MB and average time is 4.055s; w/ tomesd and ratio=0.5, the gpu memory is 5216MB and average time is 3.5749s, it is not speed up obviously as reported in table in Readme