albarji / mixture-of-diffusers

Mixture of Diffusers for scene composition and high resolution image generation
MIT License
414 stars 37 forks source link

Large VRAM used when using a guide image for the whole canvas #17

Closed zhangmxxx closed 3 weeks ago

zhangmxxx commented 4 weeks ago

Problem

I was trying to reproduce eyeguided.png in your article, then I encountered "CUDA out of memory" error and the log says the process has more than 20GiB VRAM in use. The "CUDA out of memory" problem disappeared when I remove the guide image from my generation setting.

Generation setting to reproduce the problem

python environment

Load and preprocess guide image

iic_image = preprocess_image(Image.open("eyeguided_sketch.png").convert("RGB")) # The sketch was taken by a screenshot.

Mixture of Diffusers generation

image = pipeline( canvas_height=2160, canvas_width=3840, regions=[ Text2ImageRegion(0, 480, 0, 640, guidance_scale=8, prompt=f"Abstract decorative illustration, by jackson pollock, elegant, intricate, highly detailed, smooth, sharp focus, vibrant colors, artstation, stunning masterpiece"),

...... 8 * 11 grids

    Image2ImageRegion(0, 2160, 0, 3840, reference_image=iic_image, strength=0.25),
],
num_inference_steps=50,
seed=7178915308,

)["sample"][0]



### Expected 
Could you tell me if I have misunderstanding of how to use the guide image? Or maybe my method of obtaining the sketch image is wrong?
zhangmxxx commented 3 weeks ago

It turns out that , during 'Prepare image latents' phase of the forward process StableDiffusionCanvasPipeline.__call__() , cpu_vae is ignored when calling encode_reference_image() . The details is as follows:

# line 342 in canvas.py
for region in image2image_regions:
  # cpu_vae is not passed to encode_reference_image()
  # It will use VRAM by default
  region.encode_reference_image(self.vae, device=self.device, generator=generator)