inputting prompt weights via text prompt

brycedrennan / imaginAIry

Pythonic AI generation of images and videos

MIT License

7.86k stars 422 forks source link

inputting prompt weights via text prompt #136

Open PaulMest opened 1 year ago

PaulMest commented 1 year ago

The Stable Diffusion community has a lot of examples that use weights in prompts. For example: 'Cat with !black nose! !!blue eyes!!' should have a higher weighting on black nose and an even higher weighting on blue eyes.

I just realized that prompt weights are not implemented in imaginAIry. But there appears to be something similar implemented for clip masking. Could we reuse the same parsing approach from clip masks for prompt weights? Or could you outline the steps that you think are needed to allow for prompt weighting from a text string?

Looks like there's also some sample code in this Reddit thread

brycedrennan commented 1 year ago

Prompt weights are supported via the python library but not via text prompts. I have been hesitant to implement because I don't like the widely used syntaxes and I wasn't sure if they added enough value to justify their existence. Any prompt weighting syntax may conflict with future syntax for more interesting features.

Yes the approach used for the clip masking prompts is a far more robust solution than whats going on in a lot of other codebases doing a more ad-hoc parsing approach.

All that being said there is value in just using the same syntax people are already expecting. I'd accept a pull request that implemented whatever is widely used. I'd also accept a new syntax that feels more programmatic and allows for future additions.

brycedrennan commented 1 year ago

You seem to have good judgment. Probably I'll be happy with whatever direction you want to go in.

brycedrennan commented 1 year ago

Also I suspect that "weighting" things doesn't work like people think and can make the images worse. I'd need to dive into how the other repos are doing it. In this repo something like "a happy giant !!!dog!!!" would get translated to "a happy giant" and "dog"(weight 3), which would create a picture of a hybrid humanoid giant and dog.

Very possible others have more clever approaches than how weighting works in this repo.

PaulMest commented 1 year ago

Thanks for the thoughts/guidance. I will think about it a bit more and update this thread with my thoughts or a link to a PR if I make any meaningful progress.