AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.91k stars 26.5k forks source link

[Feature Request]: Add color loss selection in txt2image #6546

Open Hellisotherpeople opened 1 year ago

Hellisotherpeople commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

Hi all, I've been OBSESSED with the idea of color constraining the output of Stable Diffusion models. I'm aware of some stuff which partially implements my vision here, namely the current color-sketch tool which was merged https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4459

I believe there are at least 3 distinct ways to implement some kind of "color control" within diffusion models.

  1. Use color-sketch in-painting / text2img to pre-color the image to be diffused, which should encourage (but not guarantee) the shaded regions have the specified color.

  2. Modify the loss function of the final step to encourage the model to generate colors for particular pixels

  3. Add some kind of "filter" to the final diffusion step, banning the diffusion model from diffusing pixels which fall outside of a given RGB color range. I actually wrote about this idea already in the future work of my paper about doing a highly similar technique in transformer language models for generating lipogrammic text (see future work section of here: https://aclanthology.org/2022.cai-1.2/). This is equivalent to "limiting" its vocabulary, but instead of it having a vocabulary of tokenized text, it has a vocabulary of all possibly rgb values for each pixel

There is already a tool for some kind of color control this, implementing 1 within this webui. However, this tool only works for in-painting (and not the main txt2image) and I also believe that this technique is somewhat inferior to 2, and especially 3 because I believe that the model can and will ignore the given colors depending on the prompt.

I ran across a really neat notebook which demonstrates a unique way to modify the loss function used during diffusion (2). This enables fine-grained color control of the image. The original notebook is here: https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wFCHhGLFooW_pf1?usp=sharing (see the bottom). I have modified this notebook to include a more sophisticated example where a user can specify (in code) bounding box locations, the rgb channel they want to modify, and the value they want to modify it with. This modified notebook is located here: https://colab.research.google.com/drive/18Kl1DndYEQOlKg9yP3uuNwRHajKqHBkh?usp=sharing

My ideal solution for this is 3, but I'm pretty ignorant about how I'd implement this given the code that I've seen. Certainly more involved than it was for NLP transformer models.

Proposed workflow

  1. Go to the main txt2image default panel
  2. Have a tool similar to the gradio color selection tool which a user can draw on the blank canvas with
  3. Have an "intensity" slider to allow users to decide how much they want the custom loss to impact their image.

Additional information

No response

Hellisotherpeople commented 1 year ago

I'm basically begging the community to come together and do a more detailed write up on this, particularly about comparing the various techniques.

If no one else works on this, I'll be "forced" to do the obvious thing and find a workshop at some high end CV conference and write a small paper (one author only) about why good color loss selection is important and why 2 (and maybe 3) are superior to the currently implemented 1 technique. I'd really love to split first author with someone whose good at writing Automatic1111 extensions, so that this can proliferate in the community.

Or even better, someone from the community can explain to me that this already exists, or why I'm a big-dum-dum head and why the current solution (1) is actually superior. Right now I feel like I'm yelling into the wind about this...

Hellisotherpeople commented 1 year ago

Ugh this is going to languish for months before being randomly closed due to inactivity isn't it?

:(

Hellisotherpeople commented 1 year ago

Why wouldn't this be a huge feature that people would be desperate for? Color choice is very important for generating how quality images.