cumulo-autumn / StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Apache License 2.0
9.7k stars 692 forks source link

txt2img error with SDTurbo model: expected shape tensor(..., device='meta', size=(64, 320)), but got torch.Size([64, 320, 1, 1]) #173

Open FletcherD opened 1 month ago

FletcherD commented 1 month ago

I'm trying to run the txt2image example from the readme. It works fine with "KBlueLeaf/kohaku-v2.1" and the readme says you can also use SD-Turbo, but when I change the model to "stabilityai/sd-turbo" I get this error.

  File "/home/media/StreamDiffusion/streamDiffusion.py", line 24, in <module>
    stream.load_lcm_lora()
  File "/home/media/StreamDiffusion/src/streamdiffusion/pipeline.py", line 87, in load_lcm_lora
    self.pipe.load_lora_weights(
  File "/home/media/StreamDiffusion/.venv/lib/python3.10/site-packages/diffusers/loaders/lora.py", line 114, in load_lora_weights
    self.load_lora_into_unet(
  File "/home/media/StreamDiffusion/.venv/lib/python3.10/site-packages/diffusers/loaders/lora.py", line 463, in load_lora_into_unet
    unet.load_attn_procs(
  File "/home/media/StreamDiffusion/.venv/lib/python3.10/site-packages/diffusers/loaders/unet.py", line 300, in load_attn_procs
    load_model_dict_into_meta(lora, value_dict, device=device, dtype=dtype)
  File "/home/media/StreamDiffusion/.venv/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 155, in load_model_dict_into_meta
    raise ValueError(
ValueError: Cannot load because down.weight expected shape tensor(..., device='meta', size=(64, 320)), but got torch.Size([64, 320, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

Code:

from diffusers import AutoencoderTiny, StableDiffusionPipeline

from streamdiffusion import StreamDiffusion
from streamdiffusion.image_utils import postprocess_image

# You can load any models using diffuser's StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/sd-turbo").to(
    device=torch.device("cuda"),
    dtype=torch.float16,
)

# Wrap the pipeline in StreamDiffusion
# Requires more long steps (len(t_index_list)) in text2image
# You recommend to use cfg_type="none" when text2image
stream = StreamDiffusion(
    pipe,
    t_index_list=[0, 16, 32, 45],
    torch_dtype=torch.float16,
    cfg_type="none",
)

# If the loaded model is not LCM, merge LCM
stream.load_lcm_lora()
stream.fuse_lora()
# Use Tiny VAE for further acceleration
stream.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd").to(device=pipe.device, dtype=pipe.dtype)
# Enable acceleration
pipe.enable_xformers_memory_efficient_attention()

prompt = "portrait of a woman, digital painting, impressionist, colorful, personality"
# Prepare the stream
stream.prepare(prompt)

# Warmup >= len(t_index_list) x frame_buffer_size
for _ in range(4):
    stream()

# Run the stream infinitely
i = 0
while True:
    x_output = stream.txt2img()
    image = postprocess_image(x_output, output_type="pil")[0]
    image.save(f"{i}.png")
    i += 1
    input_response = input("Press Enter to continue or type 'stop' to exit: ")
    if input_response == "stop":
        break

I'm guessing I need a different version of diffusers but there's no indication of this in the README or anywhere that I see.

superadi04 commented 3 weeks ago

I have this error as well