damian0815 / compel

A prompting enhancement library for transformers-type text embedding systems
MIT License
533 stars 47 forks source link

[Reporting a Bug] Error when prompt contains "!" #94

Open sk-uma opened 5 months ago

sk-uma commented 5 months ago

When using SDXL, an error will occur if a certain prompt contains too many "!" characters.

The minimal code that reproduces the problem is below.

from diffusers import StableDiffusionXLPipeline
import torch
from compel import Compel, ReturnedEmbeddingsType

prompt = "!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!"

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "cagliostrolab/animagine-xl-3.1",
    torch_dtype=torch.float16, 
    use_safetensors=True, 
).to("cuda")

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[False, True],
    truncate_long_prompts=False,
)

compel.build_conditioning_tensor(prompt)

At this time, the following error occurs.

Traceback (most recent call last):
  File "/home/sk-uma/create_dataset/test_compel.py", line 26, in <module>
    compel.build_conditioning_tensor(prompt)
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 112, in build_conditioning_tensor
    conditioning, _ = self.build_conditioning_tensor_for_conjunction(conjunction)
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 186, in build_conditioning_tensor_for_conjunction
    this_conditioning, this_options = self.build_conditioning_tensor_for_prompt_object(p)
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 218, in build_conditioning_tensor_for_prompt_object
    return self._get_conditioning_for_flattened_prompt(prompt), {}
  File "/opt/conda/lib/python3.10/site-packages/compel/compel.py", line 282, in _get_conditioning_for_flattened_prompt
    return self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
  File "/opt/conda/lib/python3.10/site-packages/compel/embeddings_provider.py", line 535, in get_embeddings_for_weighted_prompt_fragments
    text_embeddings = torch.cat(text_embeddings_list, dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 77 but got size 154 for tensor number 1 in the list.

The problem is due to the difference between SDXL's tokenizer and tokenizer_2. The problem is that pad_token of tokenizer is <|endoftext|>, while pad_token of tokenizer_2 is !. These tokenizers also treat consecutive !s as one token. For this reason, the number of tokens in the processing results of tokenizer and tokenizer_2 is different and an error occurs.

The simplest solution is to load a similar tokenizer.

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
    requires_pooled=[False, True],
    truncate_long_prompts=False,
)
Clement-Lelievre commented 4 months ago

using compel==2.0.3 (latest as of today)

I've had a similar case where the part of the prompt that's tokenized differently across both SDXL tokenizers is !' tokenizer 1 encodes it as [13222] tokenizer 2 encodes it as [0, 262]

However, just because the tokens sequences lengths are different and both above 77 doesn't mean that the call to compel will raise an error like the RuntimeError above, I've had plenty of counter-examples. For example, the prompt: "!'"*50 builds sequenes of tokens respectively of lengths 101 and 102 but works without error.

So there's something else that breaks it, It is the shape of the embeddings, typically the mismatch will be 77 vs 154 (2*77).

This method is responsible : https://github.com/damian0815/compel/blob/v2.0.3/src/compel/embeddings_provider.py#L282

The call to compel breaks when:

In this case, one exits the while loop referenced above with a shape of 77xk while the other one will exit the while loop with shape 77(k+1) or 77(k-1), and that will be breaking later, specifically in get_embeddings_for_weighted_prompt_fragments, when attempting text_embeddings = torch.cat(text_embeddings_list, dim=-1)

Having said that, the responsibility does not seem to be on compel's side nor on transformer's side (the library holding the CLIPTokenizer class). It seems to come from, as stated above, the fact that the pad_tokens differ across both tokenizers, as can be seen in the tokenizers config files:

cc @damian0815

damian0815 commented 4 months ago

good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it

Clement-Lelievre commented 4 months ago

good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it

Hi @damian0815 below is a repro snippet:

from compel import Compel
from diffusers import StableDiffusionXLPipeline

pipeline = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")

compel = Compel(
    tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
    text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
    truncate_long_prompts=False,
)
prompt = "3" * 74 + "!'"
compel([prompt])

This len-76 prompt will have 77 tokens for tokenizer 1 and 78 with tokenizer 2, hence the latter will produce two batches of 77 ie a shape of 154 vs 77.