Open sk-uma opened 5 months ago
using compel==2.0.3
(latest as of today)
I've had a similar case where the part of the prompt that's tokenized differently across both SDXL tokenizers is !'
tokenizer 1 encodes it as [13222]
tokenizer 2 encodes it as [0, 262]
However, just because the tokens sequences lengths are different and both above 77 doesn't mean that the call to compel will raise an error like the RuntimeError above, I've had plenty of counter-examples. For example, the prompt:
"!'"*50
builds sequenes of tokens respectively of lengths 101 and 102 but works without error.
So there's something else that breaks it, It is the shape of the embeddings, typically the mismatch will be 77 vs 154 (2*77).
This method is responsible : https://github.com/damian0815/compel/blob/v2.0.3/src/compel/embeddings_provider.py#L282
The call to compel breaks when:
In this case, one exits the while loop referenced above with a shape of 77xk while the other one will exit the while loop with shape 77(k+1) or 77(k-1), and that will be breaking later, specifically in get_embeddings_for_weighted_prompt_fragments
, when attempting text_embeddings = torch.cat(text_embeddings_list, dim=-1)
Having said that, the responsibility does not seem to be on compel's side nor on transformer's side (the library holding the CLIPTokenizer
class). It seems to come from, as stated above, the fact that the pad_tokens differ across both tokenizers, as can be seen in the tokenizers config files:
cc @damian0815
good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it
good work and thank you for tracking down the numbers. i've long suspected there's some weird edge case issue but haven't had a good repro for it
Hi @damian0815 below is a repro snippet:
from compel import Compel
from diffusers import StableDiffusionXLPipeline
pipeline = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
compel = Compel(
tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2],
text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2],
truncate_long_prompts=False,
)
prompt = "3" * 74 + "!'"
compel([prompt])
This len-76 prompt will have 77 tokens for tokenizer 1 and 78 with tokenizer 2, hence the latter will produce two batches of 77 ie a shape of 154 vs 77.
When using SDXL, an error will occur if a certain prompt contains too many "!" characters.
The minimal code that reproduces the problem is below.
At this time, the following error occurs.
The problem is due to the difference between SDXL's
tokenizer
andtokenizer_2
. The problem is thatpad_token
oftokenizer
is<|endoftext|>
, whilepad_token
oftokenizer_2
is!
. These tokenizers also treat consecutive!
s as one token. For this reason, the number of tokens in the processing results oftokenizer
andtokenizer_2
is different and an error occurs.The simplest solution is to load a similar tokenizer.