dakenf / diffusers.js

diffusers implementation for node.js and browser
https://islamov.ai/diffusers.js/
307 stars 34 forks source link

Infinite Prompt Length Feature #19

Closed jdp8 closed 2 months ago

jdp8 commented 4 months ago

Problem

Currently, there is a limit to the number of tokens that can be passed to the CLIP Text Encoder (usually 77 tokens) as explained here. If an input prompt should contain more than the maximum token length, the following error will be shown:

image

Solution

In order to overcome this limit and take longer prompts, AUTOMATIC1111 has this solution which consists of breaking the prompt tokens into chunks, encoding each chunk, and concatenating the encoded chunks in a Tensor before passing it to the UNET model. Here is another useful explanation of the solution.

One important detail is that in order to achieve this, I had to make sure the token lengths of the prompt and negative prompt were the same, otherwise, there would be an error when concatenating the Tensors. There is no need to break the prompt in chunks if the tokens length doesn't exceed the Tokenizer model max length.

Long Prompt Results

Before this change, the following long prompts would fail, but now they produce the following images (generated with the LCM Pipeline):

inspired

fantasy

Other

kungfooman commented 2 months ago

Wow, great job! 🥇

I was running into the same issue with too long prompts. What makes it even more annoying is that once the error occurs, the React state is wrecked and you cannot just continue. Thank you very much, merging this with my local fork.

jdp8 commented 2 months ago

@kungfooman Thank you so much! Glad that my changes were of use to you :smile: