Open wzq728 opened 1 year ago
I haven't looked into it. How does SDXL differ from normal SD? If it's similar, there's probably a way to get it to work.
I haven't done any detailed tests, but wrapping a huggingface pipeline.unet seems to work without crashing for training and inference, and produces images that are ok
this is with r=0.5 at 672x672
Does it speed it up? I think the default behavior of the diffusers implementation is to do nothing when wrapping the wrong thing, so it might not actually be doing anything.
import tomesd
from diffusers import StableDiffusionXLPipeline, StableDiffusionPipeline
import torch
import time
pipeline = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16").to("cuda")
batch_size = 4
resolution = 896
trials = 2
tt = 0
for _ in range(trials):
st = time.time()
pipeline(prompt="Laundromat Stories: Inside a laundromat on a rainy day. People load clothes into washing machines and read magazines while waiting. Charcoal drawing, chiaroscuro, dramatic
lighting from overhead fluorescents.", num_inference_steps=20, num_images_per_prompt=batch_size, width = resolution, height=resolution)
tt += time.time() - st
print("SDXL no tomesd: avg time", tt/trials)
pipeline = tomesd.apply_patch(pipeline, ratio=0.75, max_downsample = 4)
tt = 0
for _ in range(trials):
st = time.time()
pipeline(prompt="Laundromat Stories: Inside a laundromat on a rainy day. People load clothes into washing machines and read magazines while waiting. Charcoal drawing, chiaroscuro, dramatic
lighting from overhead fluorescents.", num_inference_steps=20, num_images_per_prompt=batch_size, width = resolution, height=resolution)
tt += time.time() - st
print("SDXL w/ tomesd: avg time", tt/trials)
I get around a 12% speedup on a 3090: 18.9267s vs 16.891s
Thanks for your nice work! I want to know if tome support SDXL? And if it is, how to use it.