[Feature Request]: Stable Diffusion x2 latent upscaler

AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

GNU Affero General Public License v3.0

140.73k stars 26.62k forks source link

[Feature Request]: Stable Diffusion x2 latent upscaler #7680

Open briansemrau opened 1 year ago

briansemrau commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Implement https://huggingface.co/stabilityai/sd-x2-latent-upscaler

Allows 2x upscaling in latent space

Proposed workflow

Should be an upscaling option like the other methods provided.

Additional information

No response

ProGamerGov commented 1 year ago

To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it.

I don't think that it'll work exactly like the existing upscalers. Its almost like an img2img model that takes the latent tensor instead of an image.

Cyberbeing commented 1 year ago

Rather than that, it sounds like it's designed to upscale txt2img/img2img output latent prior to VAE decoding. So rather than a post-processing upscaling step, it's being inserted into the middle of a normal SD output workflow.

To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE.

catboxanon commented 1 year ago

Don't the included latent upscalers work in a similar vein, upscaling the latent and feeding that into the upscale process? In this case if this were implemented then denoising for that second step wouldn't necessarily be needed.

Edit: Actually the way the pipeline works, it gives you the upscaled image directly. So you could denoise it further but as I mentioned it may not be needed.

catboxanon commented 1 year ago

I've implemented this now but the included VAE seems particularly awful for some reason. Maybe I can replace it with the current one in use by the web UI. I'll post some comparisons later.

catboxanon commented 1 year ago

but the included VAE seems particularly awful for some reason

I was judging this based on the fact faces turn out bad with it, but turns out that's listed as a limitation.

Faces and people in general may not be generated properly.

After experimenting a bit more it doesn't seem that great compared to other upscalers we have now imo. GAN upscalers still seem superior, and even LDSR, based on diffusion, looks a lot better. Comparison below is using https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/4446 for Latent Diffusion upscaler. I didn't replace the VAE for the SD x2 upscaler in this comparison but when I did replace it that didn't fix fundamental issues like the face and such.

xyz_grid-0001-2870305590

Frankly I don't have interest to make a PR for this with these results.