Open danielos123ssa opened 1 year ago
I don't have the expertise to confirm exactly how the neural network processes the text, but I can explain how it breaks down into numbers for the network.
For example, the word emotionless
is actually 3 tokens to the network (emo, tion, less). By default, SD takes 75 tokens at a time, as explained in a1111's wiki here.
If you put emotionless
at the very end of your first chunk, it will be split like this:
This might not yield the desired result. Whether the network understands the term "emotionless" as intended is uncertain. However, if the wiki is correct, splitting emotionless
into two chunks will for sure not produce the desired outcome.
Here is a screenshot with my PR: https://github.com/AUTOMATIC1111/stable-diffusion-webui-tokenizer/pull/9
I wish you'd be clear on that because I am interested. Is there a legend of sorts?