damian0815 / compel

A prompting enhancement library for transformers-type text embedding systems
MIT License
514 stars 47 forks source link

Blend for Interpolation #67

Open aduchon opened 1 year ago

aduchon commented 1 year ago

I'm using blend to do interpolation between two images, so weights go from 1.0->0.0 and 0.0->1.0. The problem is that the image at the endpoints is different when using blend (or and). In testing, I'm only using one of these two lines:

conditioning_refiner, pooled_refiner = compel_refiner(prompt_a)
conditioning_refiner, pooled_refiner = compel_refiner(f'("{prompt_a}", "{prompt_b}").blend(1.0, 0.0)')

Compel with just the prompt gives me what I want. But blend() changes the image a bit and for the worse.

aduchon commented 1 year ago

Here's the setup, using SDXL

  from compel import Compel, ReturnedEmbeddingsType

  base = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16",
    use_safetensors=True,
    torch_dtype=torch.float16,
    add_watermarker=False, # no watermarker
    safety_checker=None,

  )
  # memory optimization
  base.enable_model_cpu_offload()
  base.enable_vae_slicing()

  compel_base = Compel(
    tokenizer=[base.tokenizer, base.tokenizer_2] ,
    text_encoder=[base.text_encoder, base.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[False, True]
  )

  # set up refiner on cuda
  refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16, variant="fp16", use_safetensors=True,
    add_watermarker=False, # no watermarker
    safety_checker=None,

  )
  refiner.enable_model_cpu_offload()
  refiner.enable_vae_slicing()

  # and its compel
  # https://github.com/damian0815/compel/issues/53
  compel_refiner = Compel(
    tokenizer=[refiner.tokenizer_2],
    text_encoder=[refiner.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[True],
  )
damian0815 commented 1 year ago

can you check if it's the pooled_refiner which is acting differently? you can subtract the conditioning tensors and check if the result is all 0s