Acly / krita-ai-diffusion

Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
https://www.interstice.cloud
GNU General Public License v3.0
6.88k stars 340 forks source link

Less sharp inpaint because of unwanted bilinear scaling #679

Closed Danamir closed 6 months ago

Danamir commented 6 months ago

Not sure if this is a bug or by design but when doing a smallish inpaint (ie. a face), the refine_region method does this at the start :

in_image = scale_to_initial(extent, w, in_image, models)

This method in turn uses extent.initial_scaling, which by default executes the following code :

        if ratio < 1:
            return ScaleMode.resize

This is good for masks, but not for the source image as it becomes blurry because of the bilinear scaling used with ScaleMode.resize . The inpainted region is then not as sharp as it could. Shouldn't ScaleMode.upscale_fast be used here ?

I tried altering the code of extent.initial_scaling and the inpaint are indeed much sharper after the modification, at a minimal computing cost.

Source region before resize :

source

Scale method Scaled Image Rendering output Scaled output
Bilinear (ie. ScaleMode.resize) bili_1 bili_2 bili_3
OmniSR_X4_DIV2K (ie. ScaleMode.upscale_fast) omni_1 omni_2 omni_3
UltraSharp 4x ComfyUI_temp_dbsie_00004_ ComfyUI_temp_sffrd_00003_ ComfyUI_temp_zclva_00004_

(I added UltraSharp 4x to give an example of an even sharper scaler).

Danamir commented 6 months ago

On the same subject, when adding ImageScale nodes, we use the bilinear algorithm by default. I would argue that using lanczos (or ar least bicubic) is preferable in almost every case, it is still not as sharp as using an upscale model, but way better than bilinear.

image

Danamir commented 6 months ago

In fact the final ImageScale of the workflow is also using bilinear, which causes aliasing when scaling down, mostly visible on anime clear lines. In this instance the bicubic method is as bad as the bilinear one, but the results with lanczos are very nice.

I think we should simply update all calls to lanczos for downsampling and upsampling. Do you think there is a downside to this ?

In the following pictures pay close attention to the dark hair lines on light background to see visible aliasing :

Scaler Downscaled image
bilinear bilinear
bicubic bicubic
lanczos lanczos
Acly commented 6 months ago

So this is the case where the image (region) size is smaller than the checkpoint resolution. The workflows becomes "Upscale -> Diffuse -> Downscale". The main point of this mechanism is to avoid bad generation for low resolutions, since checkpoints generally can't handle them and produce garbage. It's not meant to sharpen results in the area, ideally they should be consistent with the lower resolution used by the canvas. That's why AI upscaler is not used here - if the render result appears a bit blury, that may actually be ok, because it's downscaled afterwards. Also AI upscalers can have detrimental effects too, so I want to be careful with using them automatically unless >50% diffusion happens afterwards, or it's explicitly desired.

Viewed from another angle, if you wanted sharper results of better quality, the recommendation should be to increase canvas size, using upscaler of your choice. Then there will be no implicit upscale before diffusion, and more importantly, you get to keep the full resolution render output, instead of throwing away most pixels. This is especially important for follow up refines, the upscale/diffuse/downscale workflow will always have detrimental effects, regardless of which upscaler is used.

I do very much agree with using Lanczso over Bilinear. I think it should mostly fix the case where bilinear introduces additional blur on top of the resolution change. It wasn't available in Comfy when I initially implemented this, and since got drowned somewhere on the TODO list...

Danamir commented 6 months ago

I do very much agree with using Lanczso over Bilinear. I think it should mostly fix the case where bilinear introduces additional blur on top of the resolution change

Yes, I think it would be a sufficient fix. I wasn't aware of the problem until after having posted the first issue and continued poking around in the workflow debug.

Viewed from another angle, if you wanted sharper results of better quality, the recommendation should be to increase canvas size, using upscaler of your choice.

I tend to work at lower resolution, and using the inpaint as a detailer tool. But I get that it is not a recommended usage, so no worries if it is not fully supported in the plugin.

Using an upscaler model is kind of an overkill, but I still like the idea because it has a comparable feel to using the detailer nodes in ComfyUI. Fortunately I keep a custom fork for those kind of hacks. 😁

Acly commented 6 months ago

I tend to work at lower resolution, and using the inpaint as a detailer tool.

There was this issue where I tried to convince some A1111 veterans that it's an anti-pattern. Mostly without success I think ^^

It's interesting because there's also people here who come from Krita/Photoshop and are new to SD, and of course the canvas or photo is like 6000x4000, it's normal, and why should they downscale their painting or photos for SD? And I think it's a good point. Different mediums work at different resolutions, but the canvas should match the highest to avoid loosing information. And there should be no downscales for the same reason. SD generation can be upscaled where needed, because native resolutions that high are often just not practical - but preferably not the other way round.

Ideally the only point where you downscale anything is the final "export to PNG for reddit post" step. And I'd like to make that more realistic, but admittedly not quite there yet.

(Also those detailer nodes!, ppl make them downscale the high res result they already computed to insert it into tiny image... and then upscale that afterwards?!? like, it makes so much more sense to do it the other way round...)

Drakosha405 commented 6 months ago

There was this issue where I tried to convince some A1111 veterans that it's an anti-pattern. Mostly without success I think

It's interesting because there's also people here who come from Krita/Photoshop and are new to SD, and of course the canvas or photo is like 6000x4000, it's normal, and why should they downscale their painting or photos for SD? And I think it's a good point. Different mediums work at different resolutions, but the canvas should match the highest to avoid loosing information. And there should be no downscales for the same reason. SD generation can be upscaled where needed, because native resolutions that high are often just not practical - but preferably not the other way round.

working with a large image is always good, but processing (encoding\ decoding) such images is too resource-intensive for many users of non-top computers. right now I have found this add-on and the first samples showed good results for me and high speed, while maintaining the idea of working at maximum resolution. https://github.com/lquesada/ComfyUI-Inpaint-CropAndStitch

If it were possible for your application to implement such a feature, I believe many users would appreciate it. I am interested in hearing your thoughts on whether there are any significant drawbacks in this process.

Acly commented 6 months ago

It has worked like that from day 1 - you don't really need those nodes, they just make it more convenient when using Comfy's UI. As for drawback, a smaller context is faster but means you loose coherence with the rest of the image.

Acly commented 6 months ago

1.17.2 now uses Lanczos for all image scales not handles by upscale model. Masks use bilinear still.