damian0815 / compel

A prompting enhancement library for transformers-type text embedding systems
MIT License
514 stars 46 forks source link

Compel fails with TypeError: 'NoneType' object cannot be interpreted as integer when using T5Tokenizer #82

Open davemssavage opened 7 months ago

davemssavage commented 7 months ago

This issue likely affects any tokenizer that does not define bos_token of which T5Tokenizer is an example:

Traceback (most recent call last):
  File "/workspaces/diffuser-tests/compel-test.py", line 9, in <module>
    prompt_embeds = compel("An astronaut riding a green+ horse")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/compel.py", line 135, in __call__
    output = self.build_conditioning_tensor(text_input)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/compel.py", line 112, in build_conditioning_tensor
    conditioning, _ = self.build_conditioning_tensor_for_conjunction(conjunction)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/compel.py", line 186, in build_conditioning_tensor_for_conjunction
    this_conditioning, this_options = self.build_conditioning_tensor_for_prompt_object(p)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/compel.py", line 218, in build_conditioning_tensor_for_prompt_object
    return self._get_conditioning_for_flattened_prompt(prompt), {}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/compel.py", line 282, in _get_conditioning_for_flattened_prompt
    return self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/embeddings_provider.py", line 119, in get_embeddings_for_weighted_prompt_fragments
    tokens, per_token_weights, mask = self.get_token_ids_and_expand_weights(fragments, weights, device=device)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/embeddings_provider.py", line 280, in get_token_ids_and_expand_weights
    return self._chunk_and_pad_token_ids(all_token_ids, all_token_weights, device=device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/compel/embeddings_provider.py", line 318, in _chunk_and_pad_token_ids
    all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long, device=device)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object cannot be interpreted as an integer

Steps to reproduce use PixArt-alpha/PixArt-XL-2-1024-MS that uses T5Tokenizer:

from diffusers import AutoPipelineForText2Image
from compel import Compel

pipe = AutoPipelineForText2Image.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", use_safetensors=True)
pipe.to("cpu")

compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder, device="cpu")
prompt_embeds = compel("An astronaut riding a green+ horse")
neg_prompt_embeds = compel("painting++")

result = pipe(
    prompt=None,
    prompt_embeds=prompt_embeds,
    negative_prompt=None,
    negative_prompt_embeds=neg_prompt_embeds,
    num_images_per_prompt=1, 
    num_inference_steps=15,
    height=1024, 
    width=1024, 
    output_type="pil"
)

result.images[0].save('image.png', "PNG")

Debug of the code suggests the problem lies here and here a simple hacky fix is to replace:

[self.tokenizer.bos_token_id] with ([self.tokenizer.pad_token_id] if self.tokenizer.bos_token_id is None else [self.tokenizer.bos_token_id])

However I suspect this logic will only work for the T5 Tokenizer which has the following note in the docs

Note that T5 uses the pad_token_id as the decoder_start_token_id, so when doing generation without using [generate()](https://huggingface.co/docs/transformers/v4.37.2/en/main_classes/text_generation#transformers.GenerationMixin.generate), make sure you start it with the pad_token_id.

Unfortunately it still does not work with PixelArt-Alpha as that pipeline expects an attention mask when passing in embeds directly and that will also need wider changes to Compel to support

davemssavage commented 7 months ago

Rather than large scale changes it might be better in the short term to do a type check on the constructor for Compel to assert that the tokenizers are CLIPTokenizers as per the method signature - as python doesn't actually enforce this at runtime hence the confusing down stream errors. Perhaps an update to the README.md file as well would be useful as it seems to suggest compel will work with any Tokenizer but that is evidently not the case.

damian0815 commented 5 months ago

thanks for the comments, yes, type checking would probably be a good idea.