AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.06k stars 26.96k forks source link

[Feature Request]: Latent Perturbation #4164

Open acid1103 opened 2 years ago

acid1103 commented 2 years ago

Is there an existing issue for this?

What would your feature do ?

A video outlining the requested feature can be found here (timestamp included.)

This is somewhat related to these two issues:

In essence, latent perturbation would provide more fine-grained control of image variations than the current implementation of variations, and it would allow for recursive variations (at the expense of saving parameters in PNG metadata.) This is done by taking the initial latent given to the scheduler, perturbing it according to some scale factor, and running the perturbed latent through the scheduler. This produces a variation of the image generated by the original latent, where the difference between the images is determined by the scale factor.

Providing latent perturbation allows for something similar to a binary search through image space. Essentially, you could start with an image that's "good enough." Take its initial latent, generate a few perturbed variations at a relatively high scale factor, and run those. If you like one of the results, you can take that image's initial latent, perturb it by a smaller scale factor, and generate a new batch of slightly less varied images. Repeat this process until you've found the perfect image.

As mentioned, the downside of this is that embedding these steps in PNG metadata is unfeasible. Personally, this is a tradeoff I'm okay with, but I realize this might be a point of contention.

Proposed workflow

  1. Press the "Send to Variations" button under a generated image (This would take you to img2img -> Variations. Images wouldn't be able to be uploaded to this, due to the requirement for the image's initial latent.)
  2. Select your scale factor, sampling steps, sampling method, batch parameters, etc...
  3. Click generate
  4. Select a better variation of the input image
  5. Click "Send to Variations"
  6. Go to step 2

Additional information

Obviously this is a relatively big ask. I've looked through the code, and the way that initial latents are currently generated doesn't lend itself well to this. Not to mention the work required to do the workflow and UI changes. Personally, I would be more than happy with a relatively simple change that would allow me to achieve this functionality with a script. Regardless, this is one of the most useful ways I used stable diffusion prior to using the web ui. It'd be amazing to have both the web ui and latent perturbations.

R-N commented 2 years ago

That sounds great.

Might this be relevant? https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4021

acid1103 commented 2 years ago

@R-N After playing around with that and experimenting with setting the initial latent, it seems like only PLMS and DDIM work with this method. I know very very little about the inner working of schedulers and stable diffusion in general, so unfortunately I think this will have to be done by someone else. I doubt I have time to learn enough about these things to make the necessary changes or suggestions.

To answer your specific question, the whole idea of latent perturbation is that, by subtly perturbing the initial latent, you subtly perturb the resulting image. But using the CFG denoiser callback to set the initial latent to a known state still results in completely random final images. Identical initial latents should result in identical output images, but this doesn't happen with any of the methods which call the CFG denoiser callback. So a different approach will probably need to be taken.