Using quantized version with the pipeline

HVision-NKU / StoryDiffusion

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Apache License 2.0

6.01k stars 601 forks source link

Using quantized version with the pipeline #23

Open allthatido opened 7 months ago

allthatido commented 7 months ago

Hello

I am trying to run the comic generation notebook but with quantized version to fit in my 8gb vram. using SSD-1B. However getting the below error :

The expanded size of the tensor (676) must match the existing size (2500) at non-singleton dimension 3. Target sizes: [2, 20, 676, 676]. Tensor sizes: [2500, 2500]

while running the line :

id_images = pipe(id_prompts, num_inference_steps = num_steps, guidance_scale=guidance_scale, height = height, width = width,negative_prompt = negative_prompt,generator = generator).images

Can you help solve this ?

Thanks

Z-YuPeng commented 7 months ago

We are glad to solve your problem, I am not familiar with SSD-1B. Maybe need some time. I am expected to update the code in 1-2 days.

allthatido commented 7 months ago

SSD-1B is a quantized version of SDXL where the precision on the weights are reduced from higher precision float point 16 / 32 to 1 Bit weights. This makes the inference very fast with some compromise on the quality. There are also may Lora trained models that can be used if this is implemented.

There is some tensor shape mismatch but I am unable to figure that out. Tha k you for your time

themarshall68 commented 6 months ago

I think i'm running in to the same type of problem getting a error with all the standard models, (Realvision, unstable) Running on a local machine with a 64gb of ram and a RTX4090 nvidia card 16gb of vram (mobile / laptop) The error that i get is: RuntimeError: The expanded size of the tensor (1024) must match the existing size (3072) at non-singleton dimension 3. Target sizes: [2, 20, 1024, 1024]. Tensor sizes: [3072, 3072]

zombri-eats-brainz commented 6 months ago

I am running SDXL unquantized just fine on my 8GB 1080 (fooocus), with multiple loras. What else is going on that makes this thing run out of VRAM? I don't think quantized SDXL is the answer. If multiple large models need to be pipelined together (why though?), can't the model loading and processing be handled in a better way? I'm no expert but loading one model -> process all frames -> unload model, load next model in pipeline -> process frames -> etc seems like it would be memory efficient.

allthatido commented 6 months ago

I am running SDXL unquantized just fine on my 8GB 1080 (fooocus), with multiple loras. What else is going on that makes this thing run out of VRAM? I don't think quantized SDXL is the answer. If multiple large models need to be pipelined together (why though?), can't the model loading and processing be handled in a better way? I'm no expert but loading one model -> process all frames -> unload model, load next model in pipeline -> process frames -> etc seems like it would be memory efficient.

Can you please share some more info on your setup and workflow. I have a gtx1080 but getting a cuda out of memory error if I try to run any of the py files or gradio app.