LagPixelLOL / cog-sdxl

Inference SDXL with cog including multiple models in 1 instance support.
MIT License
5 stars 4 forks source link

[FR] Hires Fix (possibly nonviable) #6

Closed Succubyss closed 5 days ago

Succubyss commented 2 weeks ago

If at all possible, it would be very nice to have Hires Fix functionality, do you know anything about that?

This might be a upstream limitation that you can't currently work around, but I just thought I'd ask.

LagPixelLOL commented 2 weeks ago

I'm not entire sure what is hires fix, but I think it's just 2 pass generation, where one lower res image is generated first then passed through image to image. If it is this case, then you can implement it on the calling side. If I implement it on the Cog side then it would take too long to generate an image, and not all images need to get upscaled using image to image. If the image quality is bad, you can choose to not use hires fix on the image, but if it's implemented in the server, then you can't know whether the image is bad or not.

Succubyss commented 2 weeks ago

I think it's more than just a 2-pass, they use Upscaler algorithms like R-ESRGAN 4x+ Anime6B or 4x-UltraSharp.

it would take too long to generate an image

The HiRes steps are generally much shorter than the original steps, like 10 or so. Here's an example of someone showing their recommended generation settings that incorporate HiRes Fix: https://civitai.com/models/780607/morgymix?modelVersionId=873021

then you can't know whether the image is bad or not

Acceptable, in my opinion. It generally fixes a lot of issues with hands and such, so a bad image can potentially be a good image.

LagPixelLOL commented 2 weeks ago

I have tried those 2 4x upscalers but they are way worse than upscaling using image to image, and from my personal usage it's better if I can decide to upscale the image or not instead of alway upscaling.

Also I consider this repo more like a wrapper for the Diffusers API than a WebUI like A1111 or SD.Next, so features like automatic upscaling is better implemented on the call side (Discord bot, WebUI deployment, etc.).

Succubyss commented 1 week ago

I have tried those 2 4x upscalers but they are way worse than upscaling using image to image, and from my personal usage it's better if I can decide to upscale the image or not instead of alway upscaling.

Also I consider this repo more like a wrapper for the Diffusers API than a WebUI like A1111 or SD.Next, so features like automatic upscaling is better implemented on the call side (Discord bot, WebUI deployment, etc.).

The idea is it upscales the image before sending it to img2img, and these upscalers are made to produce sharp outputs in which details don't get blended together, making it so that img2img gives you a better output than just giving it a lower res image and telling it to make a higher res image.

Succubyss commented 1 week ago

Reading further, it would seem that if you don't upscale beforehand, what happens is it goes through "latent upscaling" which is inherently noisier and will produce results that diverge much farther from the original image than using a proper upscaler.

Succubyss commented 1 week ago

Here's a comment that talks about it: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/7001#discussioncomment-5733878

EDIT: First URL was a bad example because it didn't provide the context that this comment I changed the link to does.

LagPixelLOL commented 1 week ago

I didn't do upscale in latent, if you provide an image which is smaller than the set resolution, it's first scaled up using PIL in pixel space, then converted to latent space by the VAE. the bicubic algorithm should produce a somewhat sharp upscaled image without using a GAN based upscaler, which takes a lot of time to execute.

My optimal route is actually using a 3 pass generation, first generate in the typical SDXL size, then upscale the image by 1.5x of the typical SDXL size, do a pass with strength 0.5, then upscale the image to 2x the typical SDXL size, and do a pass with strength 0.3, this way it will still produce near identical image while having 2 chances of fixing the hands, fingers, etc., while not getting weird anatomy from the unstable highres pass with a higher strength.

Tho if you really want to use the GAN to upscale first, I recommend using the version hosted by Replicate, as implementing it into this repo wouldn't be faster or easier to use than using the one hosted by Replicate.

Succubyss commented 1 week ago

Oh, I didn't realize you were using PIL for upscaling, that's good to know. Bicubic is really... not great, though. Why not use lanczos at the very least?

As an aside, I'm not sure if you noticed, but I commented on the closed Image URL issue. The issue is that the image parameter is a Path object, which doesn't handle URIs.

LagPixelLOL commented 1 week ago

Bicubic is really... not great, though.

Welp I thought it's good for upscaling, I can change it to lanczos in the next version.

The issue is that the image parameter is a Path object, which doesn't handle URIs.

From my testing it does accept URLs.

curl https://api.replicate.com/v1/predictions -H "Authorization: Bearer r8_xxxxx" -H "Content-Type: application/json" -d '{"version": "cd3fa6fc3ad24bff24a0f92a65903f5625eebbd6e00c2be98cfe9be34daa62c6", "input": {"image": "https://tjzk.replicate.delivery/models_models_cover_image/bfe22219-05c7-431d-b20b-8637433a9d28/0.png"}}'

This works.

Succubyss commented 1 week ago

The issue is that the image parameter is a Path object, which doesn't handle URIs.

From my testing it does accept URLs.

Oh... fuck, I see the issue. This is the Path object from cog, not from pathlib. I was doing internal testing and I fucked up my imports assuming Path was coming from pathlib. My mistake.

Welp I thought it's good for upscaling, I can change it to lanczos in the next version.

Awesome, thanks a bunch.

LagPixelLOL commented 5 days ago

https://github.com/LagPixelLOL/cog-sdxl/commit/39c8d54e23a0c20c764ac6d81e28fff052cbee7c