Open Ednaordinary opened 1 week ago
Thanks for the feedback, am not entirely sure what you mean, If I understood correctly and you want to visualise Stable Diffusion output during sampling of inference steps, you should already be able to do so with Cudacanvas, you only need to use an additional callback function to run cudacanvas after each step with the pipeline
would be something like this
import warnings
warnings.filterwarnings("ignore")
from diffusers import StableDiffusionPipeline
import torch
import cudacanvas
def display_tensors(pipe, step, timestep, callback_kwargs):
latents = callback_kwargs["latents"]
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = image - image.min()
image = image / image.max()
cudacanvas.im_show(image.squeeze(0))
if cudacanvas.should_close():
cudacanvas.clean_up()
pipe._interrupt = True
return callback_kwargs
pipeline = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-base",
torch_dtype=torch.float16,
variant="fp16"
).to("cuda")
image = pipeline(
prompt="A croissant shaped like a cute bear.",
negative_prompt="Deformed, ugly, bad anatomy",
callback_on_step_end=display_tensors,
callback_on_step_end_tensor_inputs=["latents"],
).images[0]
cudacanvas.clean_up()
Thanks for your response, this is what I meant except that I need a method to return a PIL image (or np array to be converted) so it can be used elsewhere (my use case does not involve displaying to a window)
I also use taesd because it's fast and good for immediate display
With PIL and numpy you get the CPU transfer, which is not the idea behind this module, you can modify the callback to store the image for you in , in each step, inside a list to use it later and then convert it to numpy or PIL based of your requirements
Of course. My goal is to eliminate that CPU transfer while still getting the PIL image, otherwise the callback step takes a while (even with taesd). I benchmarked it a while ago and CPU transfer was the longest step by far
This is really cool! It would be useful to have a function like this to directly handle the image ourselves, instead of writing to a canvas. This could be useful for quickly saving images from stable diffusion instead of waiting for the latents to transfer between gpu and cpu. This is especially useful for intermediate progress images or even SVD (depending on the implementation) since video latents are big