Open acid1103 opened 2 years ago
That sounds great.
Might this be relevant? https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4021
@R-N After playing around with that and experimenting with setting the initial latent, it seems like only PLMS and DDIM work with this method. I know very very little about the inner working of schedulers and stable diffusion in general, so unfortunately I think this will have to be done by someone else. I doubt I have time to learn enough about these things to make the necessary changes or suggestions.
To answer your specific question, the whole idea of latent perturbation is that, by subtly perturbing the initial latent, you subtly perturb the resulting image. But using the CFG denoiser callback to set the initial latent to a known state still results in completely random final images. Identical initial latents should result in identical output images, but this doesn't happen with any of the methods which call the CFG denoiser callback. So a different approach will probably need to be taken.
Is there an existing issue for this?
What would your feature do ?
A video outlining the requested feature can be found here (timestamp included.)
This is somewhat related to these two issues:
2163
3745
In essence, latent perturbation would provide more fine-grained control of image variations than the current implementation of variations, and it would allow for recursive variations (at the expense of saving parameters in PNG metadata.) This is done by taking the initial latent given to the scheduler, perturbing it according to some scale factor, and running the perturbed latent through the scheduler. This produces a variation of the image generated by the original latent, where the difference between the images is determined by the scale factor.
Providing latent perturbation allows for something similar to a binary search through image space. Essentially, you could start with an image that's "good enough." Take its initial latent, generate a few perturbed variations at a relatively high scale factor, and run those. If you like one of the results, you can take that image's initial latent, perturb it by a smaller scale factor, and generate a new batch of slightly less varied images. Repeat this process until you've found the perfect image.
As mentioned, the downside of this is that embedding these steps in PNG metadata is unfeasible. Personally, this is a tradeoff I'm okay with, but I realize this might be a point of contention.
Proposed workflow
Additional information
Obviously this is a relatively big ask. I've looked through the code, and the way that initial latents are currently generated doesn't lend itself well to this. Not to mention the work required to do the workflow and UI changes. Personally, I would be more than happy with a relatively simple change that would allow me to achieve this functionality with a script. Regardless, this is one of the most useful ways I used stable diffusion prior to using the web ui. It'd be amazing to have both the web ui and latent perturbations.