RoyiRa / prompt-to-prompt-with-sdxl

An implementation of the Prompt-to-Prompt paper for the SDXL architecture
91 stars 5 forks source link

Compatibility with Playground v2.5 #5

Open edixiong opened 3 months ago

edixiong commented 3 months ago

Thanks for the great repo. I am working on integrating this pipeline with Playground v2.5 (https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic), which appears to use the same pipeline as SDXL (StableDiffusionXLPipeline). Directly loading the model using .from_pretrained will not cause error but the output image has some artifact (of being gray). Do you have any clue on what modification need to be done here?

download-11

RoyiRa commented 3 months ago

Is the output image the result of simply using the model or does the P2P method cause this issue?

On Tue, 9 Jul 2024 at 09:15, edixiong @.***> wrote:

Thanks for the great repo. I am working on integrating this pipeline with Playground v2.5 ( https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic), which appears to use the same pipeline as SDXL (StableDiffusionXLPipeline). Directly loading the model using .from_pretrained will not cause error but the output image has some artifact (of being gray). Do you have any clue on what modification need to be done here?

download-11.png (view on web) https://github.com/RoyiRa/prompt-to-prompt-with-sdxl/assets/60495766/547c6cc1-13da-410b-8aee-a72108acc89b

— Reply to this email directly, view it on GitHub https://github.com/RoyiRa/prompt-to-prompt-with-sdxl/issues/5, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBEBF7PL56HHQWHNRLZZJLZLN5WPAVCNFSM6AAAAABKSE37MOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TOMRTHEZDANQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

edixiong commented 3 months ago

Hi Roy, Thanks for the reply. The output image is the result of p2p pipeline. Using the pipeline for normal generation works fine pipe = DiffusionPipeline.from_pretrained().

However, with the P2P pipeline(https://github.com/RoyiRa/prompt-to-prompt-with-sdxl/blob/main/prompt_to_prompt_pipeline.py), the output image is being gray. For example with the following code:

import torch
import numpy as np
import matplotlib.pyplot as plt
from prompt_to_prompt_pipeline import Prompt2PromptPipeline
from datasets import load_dataset
import random

seed = 10002
p1 = 0.6
p2 = 0.6
g_cpu = torch.Generator().manual_seed(seed)

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
if device.type == "cuda":
    pipe = Prompt2PromptPipeline.from_pretrained("playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16").to(device)
else:
    raise RuntimeError

prompts = ["a pink bear riding a bicycle on the beach", "a pink dragon riding a bicycle on the beach"]
cross_attention_kwargs = {"edit_type": "replace",
                          "n_self_replace": 0.4,
                          "n_cross_replace": {"default_": 1.0, "dragon": 0.4},
                          }

image = pipe(prompts, cross_attention_kwargs=cross_attention_kwargs, generator=g_cpu)
print(f"Num images: {len(image['images'])}")
from IPython.display import display
for img in image['images']:
    display(img)

The output image is being gray, like the one I shared before.

The normal SDXL pipeline will produce normal image like below: download-12

RoyiRa commented 3 months ago

I didn't come across it when I worked on the pipeline, but my intuition is that it is related to the VAE and maybe the use of float16. Did you try looking at these?

edixiong commented 3 months ago

Thanks for your suggestion. However I don't think it's VAE or float16 causing the issue. The VAE is loaded correctly (AutoencoderKL) and the normal pipeline also uses float16.

The code that uses the normal pipeline to create the normal image is

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
prompt = "a pink bear riding a bicycle on the beach"
image = pipe(prompt=prompt, num_inference_steps=50).images[0]

and it uses float16 as well.