AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.23k stars 26.42k forks source link

[Feature Request]: X/Y/Z plot - optimize workflow #16278

Open ronnihh opened 1 month ago

ronnihh commented 1 month ago

Is there an existing issue for this?

What would your feature do ?

Right now X/Y/Z plot doesn't seem to optimize it's workflow, and potentially creates a lot of unnecessary work - depending on what parameters are plotted. The script should identify and use the appropriate opportunities for saving work / time.

Proposed workflow

Consider the following scenario: Plotting difference Hires steps might make to an image:

Select "Hires steps" as X-type and input "25,40,50,60,75" as X-values. This works exactly as intended. But instead of creating the initial image, and then running hires fix 5 times on this image (and therefore saving 4x initial steps in this scenario), each loop generate the same initial image and then hires fix it.

In this scenario it would seem like a better solution to generate the initial image and then use the same initial image for all 5 hires fix.

I know not all scenarios are as easy to optimize, and especially when using both X, Y and Z in the plot. But when only using a single parameter and even two, there seems to be quite a few instances where savings could occur?

A different scenario: X-type "Hires upscaler" X-value "3 different upscalers", Y-type "Denoising" Y-value "3 different values", Z-type "Hires sampler" Z-value "3 different sampler" generates the same initial image 27 times x the amount of steps. That's a lot of seemingly unnecessary steps.

These values seems to be able to benefit from an optimization:

  1. Hires upscaler
  2. Hires steps
  3. Hires sampler
  4. Denoising
  5. VAE - maybe?

And perhaps even others?

Additional information

Thanks a lot for creating a great piece of software BTW.

w-e-w commented 1 month ago

xyz grid is XYZ good works on a premise that it operates on a outer level in that it is basically a script that runs the image generation pipeline multiple times with different input this allows the XYZ script to be relatively simple and stable and with good compatibility with extensions

in order to optimize Hires fix axis modification or monkey patch the image generation pipeline to allow "sub loop" for hirs fix adding this by patching is possible

I've done something similar when writing my extension that allows multiple batches to be run for hires pass of the one first pass while my sub loop patch in my extension dose works, I can't say about its compatibility with other extensions and the same would be true if we actually implemented a sub loop for XYZ note my extention is not what your asking, I just bring it up to illustrate the code is very spaghetti

adding this via pipeline modification would probably be a better choice but runs the rest of breaking other things such as extensions

furthermore sub loop will only support certain variable changes such as Upscaler Denoising Hires sampler other parameters such as steps requirs recalculating learned_conditioning which depends on if low vram mode is used we calculated earlier in order to prevent swapping models (in other words the code is more spaghettified)

and VAE is not just use decodeing at te end of a job, it's also used to encode the input image when performing img2img and hires fix with non-latent-upscaller, so the only time you can apply the optimization is when you're doing pure txt2img or hires fix with latent upscaler

don't quote me on this in case I miss something

depending on the exact value more information might need to be made


basically my view is that while optimization is possible it is really not worth it for one it only improves a relatively few axes second it introduce may more instability or compatibility issues with XYZ script or the pipeline I think XYZ is a tool mainly useful testing parameters, and it's not something that you use all the time keeping xyz toole simple and stable is more important than having it highly optimized for certain specific scenarios


as an alternative since hires fix is basically just txt2img -> upscale -> img2img you could basically do this optimization manually by first generating txt2img result and then send the image to img2img and setup XYZ for img2img


if someone is able to make this work without introducing too many spaghetti go ahead, don't let what I said stop you

ronnihh commented 1 month ago

Thank you very much for a thoughtful and detailed answer. Unfortunately my Python skills aren't up to the task quite yet, so I will have to let someone more qualified delve into the problem instead.