Open briansemrau opened 1 year ago
To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it.
I don't think that it'll work exactly like the existing upscalers. Its almost like an img2img model that takes the latent tensor instead of an image.
Rather than that, it sounds like it's designed to upscale txt2img/img2img output latent prior to VAE decoding. So rather than a post-processing upscaling step, it's being inserted into the middle of a normal SD output workflow.
To use it with Stable Diffusion, You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE.
Don't the included latent upscalers work in a similar vein, upscaling the latent and feeding that into the upscale process? In this case if this were implemented then denoising for that second step wouldn't necessarily be needed.
Edit: Actually the way the pipeline works, it gives you the upscaled image directly. So you could denoise it further but as I mentioned it may not be needed.
I've implemented this now but the included VAE seems particularly awful for some reason. Maybe I can replace it with the current one in use by the web UI. I'll post some comparisons later.
but the included VAE seems particularly awful for some reason
I was judging this based on the fact faces turn out bad with it, but turns out that's listed as a limitation.
Faces and people in general may not be generated properly.
After experimenting a bit more it doesn't seem that great compared to other upscalers we have now imo. GAN upscalers still seem superior, and even LDSR, based on diffusion, looks a lot better. Comparison below is using https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/4446 for Latent Diffusion upscaler
. I didn't replace the VAE for the SD x2 upscaler in this comparison but when I did replace it that didn't fix fundamental issues like the face and such.
Frankly I don't have interest to make a PR for this with these results.
Is there an existing issue for this?
What would your feature do ?
Implement https://huggingface.co/stabilityai/sd-x2-latent-upscaler
Allows 2x upscaling in latent space
Proposed workflow
Should be an upscaling option like the other methods provided.
Additional information
No response