Longer than 75 token prompts for some models

upstroke-version commented 1 year ago

Some models like Waifu Diffusion 1.4 officially support prompts that are longer than 75 tokens. In Waifu Diffusion's case, It supports upto 225 token. Does this work with Apple's ml-stable-diffusion?

atiorh commented 1 year ago

Hello! It looks like this model is using the same CLIP text encoder with "max_position_embeddings": 77 as SD v1.4. Could you please provide a pointer to the code segment related to the 225 tokens support?

225 implies 3 context windows with 75 + BOS + EOS. If they run the text encoder three times and merge the resulting text embeddings (e.g. averaging), that would explain the 225 number given this text encoder config.

upstroke-version commented 1 year ago

Sadly, I'm not knowledgeable enough to answer these questions. However, I opened this issue because I've seen this tweet

waifu-diffusion 1.4 supports 225-token prompts (up from 75) you can stitch 3 CLIP embeddings together and remove special tokens at the seams use attention masking to avoid attending to the excess thanks to NovelAI for the technique!

which was written by developer, who I believe is part of the Waifu diffusion project. These are related github urls posted on the tweet. https://github.com/Birch-san/diffusers-play/commit/e8d2b067bf49488e604b2ddd17d48aeb2fd0df13 https://github.com/Birch-san/diffusers/commit/e3a93e9d80a6b4e5122e5b9d02ad4ee60c7d1354

Also this tweet

waifu-diffusion 1.4 epoch 1 is out! supports non-square aspect ratios, triple prompt length. includes text encoder and VAE.

was retweeted by main developer of Waifu diffusion, so I naturally assumed it was official support but I might be wrong

Sorry for my ignorance.

MushR00m commented 1 year ago

I check this community pipeline and seem to support long prompts. https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py this is commit history https://github.com/huggingface/diffusers/commit/2a0c823527694058d410ed6f91b52e7dd9f94ebe

apple / ml-stable-diffusion

Longer than 75 token prompts for some models #94