AUTOMATIC1111 / stable-diffusion-webui-tokenizer

An extension for stable-diffusion-webui that adds a tab that lets you preview how CLIP model would tokenize your text.
146 stars 22 forks source link

What do the colors on each token mean? #7

Open danielos123ssa opened 1 year ago

danielos123ssa commented 1 year ago

I wish you'd be clear on that because I am interested. Is there a legend of sorts?

zero41120 commented 3 months ago

I don't have the expertise to confirm exactly how the neural network processes the text, but I can explain how it breaks down into numbers for the network.

For example, the word emotionless is actually 3 tokens to the network (emo, tion, less). By default, SD takes 75 tokens at a time, as explained in a1111's wiki here.

If you put emotionless at the very end of your first chunk, it will be split like this:

This might not yield the desired result. Whether the network understands the term "emotionless" as intended is uncertain. However, if the wiki is correct, splitting emotionless into two chunks will for sure not produce the desired outcome.

Here is a screenshot with my PR: https://github.com/AUTOMATIC1111/stable-diffusion-webui-tokenizer/pull/9 Screenshot 2024-06-13 at 18 50 33