AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.57k stars 26.75k forks source link

[FEATURE] Automatic prompt correction, tracking and sharing #2764

Open Tophness opened 2 years ago

Tophness commented 2 years ago

There are countless threads on facebook and reddit of people sharing prompts they can't get to work. Horses are twice the length of regular horses or whatever. Someone thinks they have a fix, but it doesn't apply in that user's specific context. Someone chimes in to say this is just a crude tool that will always need constant human oversight. This is laborious, inefficient, imprecise and relies on the right people seeing it that just so happened to have the same problem and find a solution. I propose we add optional UI elements to intelligently track changes between prompts (for our own sake also), rate changes that were positive or negative, add a checkbox to automatically share them online (not sure which API to use, but it needs to be free) and add inline suggestions when previously shared fixes occur in your prompt.

When you replace it with a prompt that's entirely different to start an entirely new image, there are regex similarity tests that should catch most occurrences automatically, but then there should be a button to manually tell it that you've started a new scene incase that has a false negative. (this has the added bonus of making folder management easier since we can lump all prompt attempts into 1 scene folder and manage them intelligently via history, so it's worth the user's time if they use it) Not sure the best way to implement the UI. Perhaps it could all be overlayed inline in the prompt box. e.g. If you changed the order of words in the same prompt, it would show arrows to where they've moved just above it. If you replace a word it highlights it a certain colour etc. A generic diff comparison and some regex is already so much better than a vanilla prompt. We'll need a way to also rate other users' fixes. Or maybe something like a git structure would work. This is all adding extra clutter to the UI, so I suggest adding a button below the prompts that drops these new features down to a whole new row and hiding it by default to make it entirely optional.

While we're at it, we should also include all other parameters when it sends it to the database so we can compare relative differences. e.g. If a prompt fix only applies with a certain cfg scale or sampler for a certain model, we don't have to suggest it as a fix. This has the added bonus that we can use all this data to improve the dataset. Either for the official stability training (let's just ignore how awkward collaboration with this repo and emad would be for now) or for some sort of model we can all train ourselves in a cluster, like say a giant "fix everything" hypernetwork or set of inter-related embeddings or alternate dreambooth that uses multiple class regularizers. For a single person this would be an insurmountable effort, but if we share the effort we could all end up with a model that has all the sensible fixes like distorted limbs and extra fingers included by default. It might turn out that isn't even necessary, as say if "distorted limbs" as a negative prompt weighs too much in some context, "mutilated" might be a fix, and it could turn out we can do this for everything if we actually have enough user input.

Going forward we can maybe use something like DAAM (github) to track relative changes visually, and maybe feed the database of corrections into a large language model to extrapolate the pattern we can't see as humans and replace this whole system with something like an autocorrect for plain english to prompt engineering language. Maybe these are unrealistic goals, but I think it's important we at least start collecting the data first to even see how attainable it is. horses1 horses2

Tophness commented 2 years ago

I realize this is a lot of work, so I wouldn't suggest it if I wasn't planning on contributing. Just working out how to use gradio for the first time. I've only got some basic input/output happening with new input boxes and buttons so far, so no point making a commit.

precompute commented 2 years ago

What you need is a website that'd implement this.

Or else, you can write a script that would make these things happen. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#user-scripts=

Tophness commented 2 years ago

I'm talking to the devs from lexica.art to add sending to their api. Already have it receiving from search from theirs for a live preview image of similar prompts as you type. Would mean they could potentially use their dataset to train a model that filters out prompt/image sets that are broken or not aesthetic which is the same one we could use to make this fully automated and offline. Currently adding it to modules/ui.py. I wasn't aware user scripts allowed you to mod the whole interface like that?

Tophness commented 2 years ago

Added some screenshots. @AUTOMATIC1111 are these features something you'd consider? Should I be doing it in custom scripts or main ui? I know I'm loading the previous prompts wrong, Guessing the right way is in shared module. Just getting the basic concept started

Tophness commented 2 years ago

Still needs a lot of work, but you can now left click and right click on words to build a prompt correction, submit prompts, correct your prompt etc. Not inline yet because there's no way to rate or verify previously submitted prompt corrections yet, and it's all very ugly, but the basic concept works. Once prompt parsing works better, it should be possible to get it down to just clicking on the correction which means we can fuck the prompt builder input boxes off, and possibly with firebase for the backend, use some machine learning to not even need to click anything at all. Firebase free tier should also cover simple text storage and retrieval like this.