Problem

I was trying to reproduce eyeguided.png in your article, then I encountered "CUDA out of memory" error and the log says the process has more than 20GiB VRAM in use. The "CUDA out of memory" problem disappeared when I remove the guide image from my generation setting.

Generation setting to reproduce the problem

python environment

python 3.10

the rest libs installed according to requirements.txt

generation code


# Creater scheduler and model (similar to StableDiffusionPipeline)
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipeline = StableDiffusionCanvasPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", scheduler=scheduler, use_auth_token=True).to("cuda:0")

Load and preprocess guide image

iic_image = preprocess_image(Image.open("eyeguided_sketch.png").convert("RGB")) # The sketch was taken by a screenshot.

Mixture of Diffusers generation

image = pipeline( canvas_height=2160, canvas_width=3840, regions=[ Text2ImageRegion(0, 480, 0, 640, guidance_scale=8, prompt=f"Abstract decorative illustration, by jackson pollock, elegant, intricate, highly detailed, smooth, sharp focus, vibrant colors, artstation, stunning masterpiece"),

...... 8 * 11 grids

    Image2ImageRegion(0, 2160, 0, 3840, reference_image=iic_image, strength=0.25),
],
num_inference_steps=50,
seed=7178915308,

)["sample"][0]



### Expected 
Could you tell me if I have misunderstanding of how to use the guide image? Or maybe my method of obtaining the sketch image is wrong?

albarji / mixture-of-diffusers

Large VRAM used when using a guide image for the whole canvas #17