AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
132.84k stars 25.49k forks source link

Prompt blending #972

Closed dfaker closed 1 year ago

dfaker commented 1 year ago

Prompt blending suggested by @amotile in #930

Implemented as an optional prompt1@weight1 prompt2@weight2 syntax, default off in the settings.

Some examples:

Various raw unblended elements of dragons:

image

And those prompts 'blended' in varying combinations:

image

dfaker commented 1 year ago

Very small variations in some of the complex scheduled prompts, I'm going to see if rearranging the conditionals generation fixes it.

AUTOMATIC1111 commented 1 year ago

isn't this literally the same thing that exists in other repos with :?

AUTOMATIC1111 commented 1 year ago

One clear downgrade is that now converting texts to their numeric representation is not batched, instead done one by one.

If we are to add this, we would use the same syntax as everywhere else, however I dislike it - just replacing : with @ does not help.

I'm also conflicted on whether I want to add this at all - I don't like this kind of prompt weighting, I want users to not use it, and not implementing it is a very effective way to do that.

dfaker commented 1 year ago

Batching and lookup was actually the resolution to the very minor visual differences on scheduled prompts, but even then the philosophical objections remain.

amotile commented 1 year ago

It is in fact exactly the same as other implementations of : weighting. The reason I wanted it here is to combine it with the other cool prompt stuff that's possible in this repo.

Regarding the prompt batching I think it should be possible to keep that. I'm however very new to python so I just implemented it in the easies way I could.

I agree that if it's included it would be better to do it with : as that's how other repos do it. (In my original gist that's what I did).

I'm curious: Why don't you like it? Is it this particular syntax or the concept as a whole?

shinkarom commented 1 year ago

This feature is needed. Prompt blendig is almost the only thing missing to make this ui a complete package. (The other thing is the alternate inpainting implementation that leaves unmasked areas in-place.)

AUTOMATIC1111 commented 1 year ago

Wait a second, what do you mean leaves inmasked areas in-place? That's exactly what is happening.

shinkarom commented 1 year ago

If you currently mask some part of the body and ask to redraw, it doesn't overlay the original graphics, it draws froom scratch. I guess I'm saying that inpainting needs a feature to overlay the original unmasked graphics on the top of generated masked ones.

Other than that, it only needs prtompt blending to be perfect for me.

AUTOMATIC1111 commented 1 year ago

dfaker: do you mean to say that running in batch produced different results from running the same thing one-by-one?

amotile: Firstly, I hate the chosen syntax, : is a separator and putting weight after separator and next to the other token is horrible. Second, I hate that it blends whole prompts. Rather than blending the entire thing, you could blend just some terms in the prompt, without making multiple independent prompts - that would be a lot more intuitive for the user. Something like "a digital painting of {a fire|an ice (1.4)|a plasma (0.2)} dragon". This would be more difficult to implement, but everything that is needed for that already exists in sd_hijack.

shinkarom: there is mask mode original to keep the body part untouched at the start of inpainting.

shinkarom commented 1 year ago

I'll make a separate feature request about overlays.

dfaker commented 1 year ago

dfaker: do you mean to say that running in batch produced different results from running the same thing one-by-one?

Yes, very slight details:

fantasy landscape with a [mountain:lake:0.25] and [an oak:a christmas tree:0.75][ in foreground::0.6][ in background:0.25] [shoddy:masterful:0.5]

00858-503127863- _fantasy landscape with a  mountain_lake_0 25  and  an oak_a christmas tree_0 75  in foreground__0 6  in background_0 25   sh 00859-503127863- _fantasy landscape with a  mountain_lake_0 25  and  an oak_a christmas tree_0 75  in foreground__0 6  in background_0 25   sh

amotile commented 1 year ago

AUTOMATIC1111: That it's the syntax you don't like is great. Because I also find it lacking.

In my UI I add another layer of syntax on top of this feature, exactly because you get the best results when you repeat parts of the prompt in both "blended" parts.

This is sort of how I did it in mine:

a detailed photo of a:0
city with vines covering buildings, lens flare:0
{0} cyberpunk {1}:0.3
{0} fantasy {1}|0.7

but this is a pretty awful looking syntax and do not think it would be a good fit here.

For one you only have one weighting we would expect it to work like this right: a digital painting of {a fire|an ice (1.4)|a plasma (0.2)} dragon becomes:

a digital painting of a fire dragon:1
a digital painting of an ice dragon:1.4
a digital painting of a plasma dragon:0.2

and then the learned_conditioning are combined together

If you have multiple once it would multiply? a {digital(1)|oil(2)} painting of {a fire|an ice (10)} dragon becomes

a digital painting of a fire dragon:1   [1 * 1]
a oil painting of a fire dragon:2    [2 * 1]
a digital painting of an ice dragon:10  [1 * 10)
a oil painting of an ice dragon:20  [2 * 10]

This gives digital vs oil a 0.33 vs 0.67 weight and fire vs ice a 0.9 vs 0.91 And that's exactly what I would expect.

as for the syntax there's no need for a start/end grouping of the weight since it's always going to just be one thing that can't contain the end token for the whole group.

So something like this would work just as well I think:

{digital|oil:2} painting of {a fire|an ice:10} dragon

amotile commented 1 year ago

if we use this instead a digital painting of {a fire|an ice@1.5} dragon

it doesn't break the scheduled prompts parser and this works. a digital painting of [{a fire|an ice@1.5} dragon:polar bear:0.5]

this would work with both: a digital painting of {a [fire:lava:0.5]@2|an ice@1.5} dragon

Even if is better solved with/in sd_hijack it still needs to go though the get_learned_conditioning_prompt_schedules first right?

Asmageddon commented 1 year ago

I'm also conflicted on whether I want to add this at all - I don't like this kind of prompt weighting, I want users to not use it, and not implementing it is a very effective way to do that.

In Midjourney, multiprompting is an extremely powerful tool. For example, instead of <description of A> next to <description of B> which gets blended together into an incoherent mess, you can prompt <description of A> next to B :: <description of B> next to A, and actually get two subjects most of the time.

It's an advanced feature for sure, but it could do great things. What if you could multiprompt CLIP embeddings directly? Take pictures you like, copy their embeddings, blend them together with whatever you'd like to get, and voila.


...syntax...

On the topic of syntax, perhaps it would be a good time to standardize basic syntax primitives instead of each script doing their own thing? E.g.:

Exact characters aside, I think most scripts that exist so far could be converted to work with this instead of needing to extend syntax, and it would look reasonably decent even with fairly complex examples, e.g.:

$clip_embedding :: {An ice@2::A fire} dragon flying above a {black neogothic castle |5| golden marble palace}. $my_modifiers --seed=42

This would generate a series of 5 images interpolating the different castle styles, each a blend of example_image and two text prompts, with your favorite artists and keywords inserted.

dfaker commented 1 year ago

$clip_embedding :: {An ice@2::A fire} dragon flying above a {black neogothic castle |5| golden marble palace}. $my_modifiers --seed=42

Have you considered just showing the user multiple sliders and letting them populate the input layer directly?

bmaltais commented 1 year ago

I really like this PR. This is the feature that was missing from webui that I used to like a lot as part of the lstein fork. There is it referred as prompt weighting... but this would, in effect produce, the same result.

Looking forward to play with this. I will grab the PR and see how it goes.

Asmageddon commented 1 year ago

@dfaker That's seriously uncalled for. My entire argument is to standardize prompt syntax and allow extending functionality without forcing every single new feature to invent new syntax and add new UI elements.

Generating and blending between multiple prompts is the most common script functionality, variables are an obvious approach to shorthands, grouping saves a lot of repeating surrounding text, and specifying parameters via text is convenient for sharing prompts.

dfaker commented 1 year ago

Ha, apologies the final combination bring it all together being so far from natural language it's quite funny.

The claimed intent makes sense but it's still ridiculous, training a system to respond to seem to natural language and then throwing a grab bag of pseudo programming syntax over the top of it!

bmaltais commented 1 year ago

@dfaker What is the current correct syntax for your PR? I am trying:

prompt: a black cat@1 a white cat@1 seed 1:

image

prompt: a black cat@1 a white cat@10 seed 1:

image

and I get mostly the same mixed cat. Am I typing it wrong?

dfaker commented 1 year ago

For this PR in particular, for me it comes down to determining the expressive power of the scheduling vs weighted tensor jiggling, if you can get some effect with the blending that you can't with the schedules then that seems like reason enough for me.

It certainly seems as in the dragon example and a few others I saw in testing blending seems to allow you to preserve a general shape while throwing subtle styles over it (although it does break down and totally switch images so it's by no means continuous), but I've not played around with the scheduling enough to know how far that's possible there too.

and I get a black cat instead of a mostly white cat. Am I typing it wrong?

Looks fine - did you turn the option for it on in the settings tab?

bmaltais commented 1 year ago

Looks fine - did you turn the option for it on in the settings tab?

Oups, no, I did not. Let me look for it.

OK, now it does what I was hoping for:

Prompt: a black cat@1 a white cat@2 seed 1:

image

Prompt: a black cat@1 a white cat@10 seed 1:

image

Prompt blending "(weighting)" is not easy to master and can lead to "interesting" unexpected results... but overall, a nice tool in the toolkit.

dfaker commented 1 year ago

Prompt blending "(weighting)" is not easy to master and can lead to "interesting" unexpected results

Quite, an interesting one I got was:

xy_grid-0028-503127863-A fox@10 red eyes@1

Totally different to my expectations of 'blending' in a prompt for 'red eyes', but somehow apt too, not an effect one could plan to use however.

bmaltais commented 1 year ago

Here is an example of using this PR to merge two different "output" from two separate prompt to produce a 3rd output being a weighted mix of both:

prompt: portrait female commander shepard (amber heard), cyberpunk futuristic neon, hyper detailed, digital art, trending in artstation, cinematic lighting, studio quality, smooth render, unreal engine 5 rendered, octane render, Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face seed: 1

image

Prompt: ultra realistic style illustration of a cute red haired (amber heard), sci - fi, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, 8 k frostbite 3 engine, ultra detailed seed: 1

image

Mixing both prompt 1 for 1: female commander shepard (amber heard), cyberpunk futuristic neon, hyper detailed, digital art, trending in artstation, cinematic lighting, studio quality, smooth render, unreal engine 5 rendered, octane render, Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face@1 ultra realistic style illustration of a cute red haired (amber heard), sci - fi, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, 8 k frostbite 3 engine, ultra detailed@1 seed: 1

image

If you blend them 1:2 instead you get:

image

I had written a HOWTO for the lstein fork using this prompt weighting technique and the weighted seed variation technique to craft outputs: https://github.com/invoke-ai/InvokeAI/issues/359

I don't think webui can be used to weave multiple seeds with variable assigned weights yet... but hoping one day this feature is added by a brave soul.

amotile commented 1 year ago

@dfaker to me it seems like you can get blending effects with this that you can't with scheduling (or at least I can't as easily)

[fantasy landscape:fantasy landscape, fire:1] -> [fantasy landscape:fantasy landscape, fire:10] image

vs

{fantasy landscape:0|fantasy landscape, fire:1} -> {fantasy landscape:1|fantasy landscape, fire:0} image

amotile commented 1 year ago

@dfaker The problem with the fox example is that is sort of becomes less "fox" as you blend in the eyes (only 50% as much) I think this is why I have better results when I repeat that part of the prompt and why the suggested syntax would be better.

then you would write something like A fox {|red eyes}

or in the current version:

A fox@1
A fox red eyes@1

image

bmaltais commented 1 year ago

@dfaker The problem with the fox example is that is sort of becomes less "fox" as you blend in the eyes (only 50% as much) I think this is why I have better results when I repeat that part of the prompt and why the suggested syntax would be better.

then you would write something like A fox {|red eyes}

or in the current version:

A fox@1
A fox red eyes@1

I think this is why I have better results when I repeat that part of the prompt and why the suggested syntax would be better.

You are 100% right. Think of it as creating two different images, one for each prompt for the same seed. Look at both and imagine what would happen if you blended them at different ratio. In the fox example, creating an image for red eye clearly does not produce a fox... hence why it deviates so much... but in @amotile example the fact that fox is specified for both prompt will keep the subject present in both and produce better merge.

specblades commented 1 year ago

Are you guys talking about cross-attention or something else? There is repo with stunning results and, i think, medium in difficulty to use.

https://github.com/bloc97/CrossAttentionControl

fouranimals fourstyles fourseasons

shinkarom commented 1 year ago

This is already in. We're talking about a different thing.

bmaltais commented 1 year ago

@specblades No, those are two different technique. This PR is about merging two or more prompt outputs for the same seed at different ratio.

Think of it as an artist blending different colors with different ratio to obtain a new color that has some the characteristics of the blended ones.

It can sort of give a similarly looking result as CrossAttentionControl but using a different method. Look at my previous example above where I merge two prompts at 50/50 ratio to get a 3rd one. I provided the outputs of both prompt so you can see how they influence the one produced by the merge.

Combining both will be very powerful.

There is also another technique that I hope will be added where you can merge specific seeds with different weight to also influence the resulting output as part of combining CrossAttentionControl, Prompt Weighting and Seed weighting.

specblades commented 1 year ago

This is already in. We're talking about a different thing.

@shinkarom CrossAttentionControl already in automatic repo? Cant find docs

bmaltais commented 1 year ago

This is already in. We're talking about a different thing.

@shinkarom CrossAttentionControl already in automatic repo? Cant find docs

When you use [snowy montain:desert montain:0.3] in a prompt you are doing CrossAttentionControl.

amotile commented 1 year ago

@bmaltais interesting, how would you do the -fog examples with what's in already?

Honestly, I don't even quite understand how you would do it with the linked one. The writeup is very confusing.

Ether way it seems outside the scope of this RP.

shinkarom commented 1 year ago

Here it's (parentheses) for +term and [brackets] for -term.

bmaltais commented 1 year ago

@bmaltais interesting, how would you do the -fog examples with what's in already?

Honestly, I don't even quite understand how you would do it with the linked one. The writeup is very confusing.

Ether way it seems outside the scope of this RP.

Indeed multiple seed blending is another technique used to obtain finer results and is something outside of this PR... but interesting none the less.

bmaltais commented 1 year ago

Here it's (parentheses) for +term and [brackets] for -term.

I thought it was the [\<from>:\<to>:\<weight>] syntax... so what is this called? I am losing track ;-)

shinkarom commented 1 year ago

The syntax you've mentioned is Target Replacement. The one I've mentioned is Direct Token Attention Control.

Outside the scope of this PR, of course.

dfaker commented 1 year ago

There's two, the scheduled prompt editing with [from:to:when] and a token based attention with (increased attention) [decreased attention] and with ((((((((((multiple brackets)))))))))) for a stacked (10%?) stronger effect.

bmaltais commented 1 year ago

Getting back to the subject of the PR. From the testing I have done so far I am very pleased with the result. Hope this will get merged in the main branch soon.

bmaltais commented 1 year ago

@dfaker I noticed a warning that there are two many tokens... I don't think this should be triggered since this is handled as two sub prompts that are individually assessed? Possibly when the token count is being evaluated need to be moved in the sub prompt routine vs on the full combined prompt field? See warning below in the info of the output image

portrait female commander shepard (amber heard), cyberpunk futuristic neon, hyper detailed, digital art, trending in artstation, cinematic lighting, studio quality, smooth render, unreal engine 5 rendered, octane render, Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face@1

ultra realistic style illustration of a cute red haired (amber heard), sci - fi, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, 8 k frostbite 3 engine, ultra detailed@2
Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 1, Size: 512x512, Model hash: 7460a6fa

Warning: too many input tokens; some (35) have been truncated:
, intricate , elegant , highly detailed , digital painting , artstation , concept art , smooth , sharp focus , illustration , 8 k frostbite 3 engine , ultra detailed @ 2
dfaker commented 1 year ago

Are there any other approaches for the weighting other than weighted sum? I tried slerp on the 2 value ones but that obviously falls down for higher counts but it was an interesting difference particularlyin the middle of the interpolation.

dfaker commented 1 year ago

@dfaker I noticed a warning that there are two many tokens...

Good catch I'll address that prior to moving out of draft if given a green light. Could be a more general miss triggering for the other re-writings too thinking about it 🤔

bmaltais commented 1 year ago

Are there any other approaches for the weighting other than weighted sum? I tried slerp on the 2 value ones but that obviously falls down for higher counts but it was an interesting difference particularlyin the middle of the interpolation.

Hummm... I was possibly thinking of using the number of tokens in a prompt as part of the weight? The more token a prompt has the more overall weight it should have? If you put more token it is because you work harder at getting the resulting style... but does this warrent a higher weight? Probably not... but just a tought.

bmaltais commented 1 year ago

Are there any other approaches for the weighting other than weighted sum? I tried slerp on the 2 value ones but that obviously falls down for higher counts, but it was an interesting difference particularly in the middle of the interpolation.

What about shifting the weight progressively as the steps are moving forward?

Say for 10 steps... on step one you apply 10% of the sub-prompt shifting, on step 2 20%, etc, etc... until the full effect take place a step 10. This might be another technique that is separate from this PR... Like slowly shifting from one prompt to the other. Might be interesting for users that want to do animation between two prompts... but that would not really be that. Not sure what that would actually be called and good for. It is more like a linear scale application of the prompt weights across the total steps. Perhaps this would make the red eye fox example work better given the fox would have more weight at the beginning.

bmaltais commented 1 year ago

Are there any other approaches for the weighting other than weighted sum? I tried slerp on the 2 value ones but that obviously falls down for higher counts, but it was an interesting difference particularly in the middle of the interpolation.

@dfaker another possibility that could be interesting:

Maybe this could be used in the same way as the "Target replacement" where you specify when the subprompt weight should take effect... So this would be called "Prompt Target Influence" and be use like in:

white cat@2:0.5 black cat@1:0 steps 10

meaning the white cat prompt will only start to be applied on step 10 0.5 = 5 and black cat would apply from the beginning because 1 0 = 0... so starting at step 0.

The part to start applying could be optional. If not provided it start at step 0.

dfaker commented 1 year ago

I was thinking more of the internal mechanism of how the conditionings are combined rather than that level of behavioral change.

amotile commented 1 year ago

I have a feeling that this syntax isn't going to get the green light (and I do agree that the other one is better)

But with both I don't think you need any special syntax to get it working with the target replacement. It would just work:

As I mention earlier:

if we use this a digital painting of {a fire|an ice@1.5} dragon

it works with this (switches from the fire/ice dragon to a polar bear, halfway though): a digital painting of [{a fire|an ice@1.5} dragon:polar bear:0.5]

and this (switches from fire to lava halfway): a digital painting of {a [fire:lava:0.5]@2|an ice@1.5} dragon

even this would work (switch from weight 2 ->10 halfway): a digital painting of {a fire@[2:10:0.5]|an ice@1.5} dragon

bmaltais commented 1 year ago

What if you want to merge three prompts with different weights? I don't see how this would work in that case.

On Sat, Sep 24, 2022 at 12:51 PM Amotile @.***> wrote:

I have a feeling that this syntax isn't going to get the green light (and I do agree that the other one is better)

But with both I don't think you need any special syntax to get it working with the target replacement. It would just work:

As I mention earlier:

if we use this a digital painting of {a fire|an @.***} dragon

it works with this (switches from the fire/ice dragon to a polar bear, halfway though): a digital painting of [{a fire|an @.***} dragon:polar bear:0.5]

and this (switches from fire to lava halfway): a digital painting of {a @.|an @.} dragon

even this would work (switch from weight 2 ->10 halfway): a digital painting of {a @.:10:0.5]|an @.} dragon

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/972#issuecomment-1257013287, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZA34XFTOGK3MHZ56XZJQ3V74WP3ANCNFSM6AAAAAAQUOWTYU . You are receiving this because you were mentioned.Message ID: @.***>

amotile commented 1 year ago

like this a digital painting of {a fire|an ice@1.5|a plasma@10} dragon

bmaltais commented 1 year ago

But if the 3 prompts are totally unrelated but produce similar images I want to merge... Like in my example above where I mix the two prompt generating a sci Fi girl?

On Sat, Sep 24, 2022 at 1:09 PM Amotile @.***> wrote:

like this a digital painting of {a fire|an @.|a @.} dragon

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/972#issuecomment-1257019213, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZA34T2ZUL2QVBTVN32SB3V74YWLANCNFSM6AAAAAAQUOWTYU . You are receiving this because you were mentioned.Message ID: @.***>

bmaltais commented 1 year ago

Would that be

@.**@*.**@*.***} For a 1:1:1 mix?

On Sat, Sep 24, 2022 at 1:09 PM Amotile @.***> wrote:

like this a digital painting of {a fire|an @.|a @.} dragon

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/972#issuecomment-1257019213, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZA34T2ZUL2QVBTVN32SB3V74YWLANCNFSM6AAAAAAQUOWTYU . You are receiving this because you were mentioned.Message ID: @.***>