damian0815 / compel

A prompting enhancement library for transformers-type text embedding systems
MIT License
519 stars 47 forks source link

ValueError: `prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but got: `prompt_embeds` torch.Size([1, 154, 2048]) != `negative_prompt_embeds` torch.Size([1, 77, 2048]). #75

Closed ynie closed 9 months ago

ynie commented 9 months ago

Hey, thanks again for the framework. I have the following code working fine on my local computer, but it failed on replicate server(remote GPU).

ValueError:prompt_embedsandnegative_prompt_embedsmust have the same shape when passed directly, but got:prompt_embedstorch.Size([1, 154, 2048]) !=negative_prompt_embedstorch.Size([1, 77, 2048]).

What did I miss? Thanks!

self._background_image_pipe = StableDiffusionXLControlNetImg2ImgPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0",
            controlnet=[self.depth_control_net.controlnet],
            torch_dtype=torch.float16,
            variant="fp16",
            cache_dir=transformers_utils.get_ml_model_catch_path()
        ).to(transformers_utils.get_device_type())

compel = Compel(tokenizer=[self._background_image_pipe.tokenizer, self._background_image_pipe.tokenizer_2],
                text_encoder=[self._background_image_pipe.text_encoder, self._background_image_pipe.text_encoder_2],
                returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
                requires_pooled=[False, True],
                truncate_long_prompts=False)
positive_conditioning, positive_pooled = compel(lora_style.style_prompt(text_prompt))

if negative_prompt is None:
    negative_prompt = ("out of frame, text, error, cropped, jpeg artifacts,nout of frame, extra fingers, "
                       "mutated hands, poorly drawn hands, poorly drawn face, blurry, bad anatomy, malformed "
                       "limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many "
                       "fingers, long neck, username, watermark, signature.")
negative_conditioning, negative_pooled = compel(negative_prompt)

images = self._background_image_pipe(
    prompt_embeds=positive_conditioning,
    pooled_prompt_embeds=positive_pooled,
    negative_prompt_embeds=negative_conditioning,
    negative_pooled_prompt_embeds=negative_pooled,
    control_image=[depth_image],
    image=reference_image,
    generator=generator,
    num_inference_steps=num_inference_steps,  # steps between 15 and 30 work well for us
    strength=denoising_strength,  # make sure to use `strength` below 1.0. (SDXL has issues when strength is 1.0)
    guidance_scale=guidance_scale,  # how close to follow the prompt(aka. classifier-free guidance scale)
).images
ynie commented 9 months ago

Ah I didn't follow the examples on the website. conditioning, pooled = compel([positive_prompt, negaitve_prompt])