Function to return a pillow image or np array

Ednaordinary commented 1 week ago

This is really cool! It would be useful to have a function like this to directly handle the image ourselves, instead of writing to a canvas. This could be useful for quickly saving images from stable diffusion instead of waiting for the latents to transfer between gpu and cpu. This is especially useful for intermediate progress images or even SVD (depending on the implementation) since video latents are big

OutofAi commented 1 week ago

Thanks for the feedback, am not entirely sure what you mean, If I understood correctly and you want to visualise Stable Diffusion output during sampling of inference steps, you should already be able to do so with Cudacanvas, you only need to use an additional callback function to run cudacanvas after each step with the pipeline

would be something like this


import warnings
warnings.filterwarnings("ignore")
from diffusers import StableDiffusionPipeline
import torch
import cudacanvas

def display_tensors(pipe, step, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]

    with torch.no_grad():
        image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
        image = image - image.min()
        image = image / image.max()

    cudacanvas.im_show(image.squeeze(0))

    if cudacanvas.should_close():
        cudacanvas.clean_up()
        pipe._interrupt = True

    return callback_kwargs

pipeline = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1-base",
    torch_dtype=torch.float16,
    variant="fp16"
).to("cuda")

image = pipeline(
    prompt="A croissant shaped like a cute bear.",
    negative_prompt="Deformed, ugly, bad anatomy",
    callback_on_step_end=display_tensors,
    callback_on_step_end_tensor_inputs=["latents"],
).images[0]

cudacanvas.clean_up()

Ednaordinary commented 1 week ago

Thanks for your response, this is what I meant except that I need a method to return a PIL image (or np array to be converted) so it can be used elsewhere (my use case does not involve displaying to a window)

I also use taesd because it's fast and good for immediate display

OutofAi commented 1 week ago

With PIL and numpy you get the CPU transfer, which is not the idea behind this module, you can modify the callback to store the image for you in , in each step, inside a list to use it later and then convert it to numpy or PIL based of your requirements

Ednaordinary commented 1 week ago

Of course. My goal is to eliminate that CPU transfer while still getting the PIL image, otherwise the callback step takes a while (even with taesd). I benchmarked it a while ago and CPU transfer was the longest step by far

OutofAi / cudacanvas

Function to return a pillow image or np array #5