[FEATURE_REQUEST] option for multiple upscale procedure

the-liquid-metal commented 5 months ago

It is a madness situation for users who have to do upscale multiple times if their hardware has limited spec. If a single upscale on a single image needs 3 minutes to finish, how much time to finish consecutive upscale? n^2?. Even worse, Real-ESRGAN does not provide "realesrgan-x2plus" and "realesrgan-x3plus" models.

I've experienced this, when upscaling 900px or less to 4K. The lessons I learned: i have to upscale it without AI first, and then with AI.

I propose that realesrgan-gui provides additional option as select widget with items:

always upscale with AI.
upscale without AI first, followed by with AI.

This new option is only relevant with: "scale to width" or "scale to height" options, and only applied if those scales is greater than pixels produced by first upscale.

There should be a visible note for this new option.

the-liquid-metal commented 5 months ago

A fundamental question: does realesrgan-gui perform multiple upscales if the "scale to width" or "scale to height" options have values greater than the pixels produced by the ~~first upscale~~ model, or simply run single upscale and ignoring the value?

TransparentLC commented 5 months ago

I don't know what "upscale it without AI" means.

Multiple upscales is performed if upscale once doesn't reach the specified resolution. If you upscale a 900px image to 4k with a 4x model, the GUI will run Real-ESRGAN for twice: upscale 900px to 3600px, then to 14400px, then downsample to 4k. This is what waifu2x-caffe does, as you can notice it from upscaling time: ... = 1.99x = 2.00x < 2.01x = ... =3.99x = 4.00x < 4.01x = ...

You may think that upscaling from 3600px to 14400px with Real-ESRGAN is a waste in this case, but I think it is more like a compromise. Upscaling from 3600px to 4k with lanczos or bicubic or something else without running Real-ESRGAN makes no sense.

the-liquid-metal commented 5 months ago

I don't know what "upscale it without AI" means.

It means simple resizing with classical algorithms to the nearest larger dimension.

Suppose that i want my picture to have 4096px. My current picture is 900px. So, the nearest larger dimension is 1024px (4096÷4). Before i do upscale with 4x model, i do resize my image from 900px to 1024px.

This will save a lot of time.

This also means no more downsampling if "upscale without AI first, followed by AI" is selected.

the-liquid-metal commented 5 months ago

FYI, XnConvert (or any similar app) only takes a few seconds to resize (lanczos) with dozens of images.

TransparentLC commented 5 months ago

I accept and plan to implement this feature.

Document how it should be implemented here. Imagining if you are upscaling from 1000px to 16001px (not divisible by multiples of 4) with a 4x model:

math.log(16001 / 1000, 4) = 2.00005, so Real-ESRGAN need to be run twice (4 ** 2 = 16x upscale). I set a threshold here, the pre-upscale is only enabled if the fractional part is lower than 0.5.
Calculate the pre-upscale size: math.ceil(16001 / (4 ** 2)) = 1001
The image will be upscaled to 1001px via lanczos, then be upscaled via Real-ESRGAN twice to 4004px and 16016px, then downsample to 16001px.

the-liquid-metal commented 5 months ago

O my goodness, this turns out to be a bit more complicated than I imagined.

TransparentLC commented 5 months ago

You can try this feature from the latest build from actions now.

the-liquid-metal commented 5 months ago

Quick and dirty test Input: a photoset with 39 files, landscape and portrait orientation, has 700 x 465, scale to 3500 width/height. Result:

Time required was much less, as expected.
There is tiny inacuracy on dimension: expected 3500 x 2324, but obtained 3500 x 2325 (1 pixel excess) both landscape and portrait. For me, as long as 3500 is achieved, it's okay.
It seems another compression happens after upscaling. With the latest build, file size are much less then manual prosedure (1/5 - 1/8 times).

I use XnConvert to resize pre-upscale but unfortunately i didn't know its compression level (i guess 80%). Can you tell me what percentage is applied for pre-upscale? Is it picked from "lossy compression quality" field?

To find out whether the compression level is correct or not (80%), i applied Caesium to both outputs. and the result: i gain 5-7% smaller file size from the latest build and 82-87% from manual prosedure. It means the latest build did a compression but not 80% (as displayed in the "lossy compression quality" field).

That's all I can do at the moment. I will conduct more comprehensive test latter.

TransparentLC commented 5 months ago

real-esrgan-ncnn-vulkan uses libwebp and Windows Imaging Component (only on Windows) / stb_image to save images. The webp comes from real-esrgan-ncnn-vulkan is lossless. For jpg it uses the best quality the encoder supports (1.0 for WIC and 100 for stb_image).

The GUI's lossy compression only works on the final output if you enabled the option and the output extension is jpg or webp. In this case the GUI will let real-esrgan-ncnn-vulkan outputs a lossless webp, then compress by itself. I enabled maximum compression options (method 6 for webp, optimize and progressive for jpg) in this procedure. If the GUI's lossy compression is not enabled then the output of real-esrgan-ncnn-vulkan will be used directly.

Pre-upscale outputs a lossless png/webp. The compression level of this intermediate file should have no effect on the compression level of the final output.

the-liquid-metal commented 5 months ago

I have informed this app to my colleagues. Based on our discussion, We have some notes related to multiple upscale. My colleague found an alternative route and he showed me that this route was worth doing. The final output is slightly better than the first route (upsize → upscale), but not better than the original route (upscale → upscale → downsize).

Rather than immediately resizing to ¼X of the required final dimensions, we do upscale first and then resizing to ¼X of the required final dimensions. And the last step is still upscale 4x. so this route is: upscale → downsize → upscale.

This seems odd, but it is the best tradeoff between speed and quality. My previous test takes time roughly 15 minutes for original route, and 1,5 minutes for first route. The upsize step only takes time a fraction of second. So, if we do second route, it will be less than 2x required time for first route, which is less than 3 minutes. It is still save a lot of time.

Why is it less than 2x required time? Because dimension of 1st upscale (U1) is lesser then 2nd upscale (U2). The first route is actually doing U2, not U1.

We do first route if the dimensional differences is less than 10%, the second route for others. It is reasonable to keep these two routes exist, to optimize time. In our humble opinion, doing second route for every dimensional differences is not necessary, since we will not get significant quality improvement.

At the moment, realesrgan-gui already has "Try to pre-upscale with general algorithm" option. It does not need to be changed. The existence of the first and second routers does not need to be exposed to user, and there is no interaction between user and these two routers. The app is automatically select which router will be used. This requires a little additional calculation (hopefully 😄). After this procedure revision, "Try to pre-upscale with general algorithm" is slightly misleading, since upscaling is also involved in pre-upscale step. IMO, it should be changed to "Try to pre-upscale first".

TransparentLC commented 5 months ago

The final output is slightly better ...

Really? Let's do an experiment ...

I randomly select 20 images from Pixabay and resized the width to 4000px as the source image. Then resize to a random size between 500-900px, finally upscale to 4000px with two methods:

Method A: upscale to 1000px, then use Real-ESRGAN to upscale to 4000px (Implemented in the GUI)
Method B: use Real-ESRGAN to upscale 4x, then resize to 1000px and use Real-ESRGAN to upscale again to get 4000px (What your colleague suggests)

Measure the upscaling quality by SSIM (the range is 0-1, 1 means that the two images are identical) compared with the source image. And here is the result:

Image	Method A	Method B
bird-8724916.webp	0.8383716241128215	0.8084521998955357
cape-robin-chat-3457709.webp	0.6689714033967844	0.6333938842469079
chevrolet-8647804.webp	0.8739473732596035	0.8380585663438305
coffee-8684315.webp	0.8954174533895213	0.8586715671519265
flamingo-8348527.webp	0.8703230315343682	0.8341921357903818
flowers-882828.webp	0.9220320752375305	0.8935422038970172
gem-3190526.webp	0.9094632791085002	0.8819865801582157
hamburg-8573427.webp	0.7909050157587569	0.7764784655858473
hinduism-8464313.webp	0.8608063932901557	0.8082481234285294
lake-4839058.webp	0.84792627627998	0.8186624343257443
lake-7624330.webp	0.713461903930544	0.6743180925413687
lavender-8075280.webp	0.9203500941541677	0.9038760834624219
nature-3112997.webp	0.8117017197500049	0.7859409977430579
ocean-8408693.webp	0.7318064691344405	0.6701012540354522
pink-algae-5389441.webp	0.5377135373459919	0.4536347436015608
sand-7468945.webp	0.563362584326837	0.5353643477371843
sea-2755858.webp	0.6562101044241124	0.6117211383355793
straubing-8669480.webp	0.7818840830184042	0.7591033255753219
stubble-8142239.webp	0.8300588379881402	0.7646237076381848
woman-8619512.webp	0.9310556298676326	0.9130578385451381

Method A always performs better than Method B. So I'm not going to change the pre-upscale procedure.

I wrote the following script for testing. You can download those images from Pixabay (use links like https://pixabay.com/photos/bird-8724916/) or use other images and test by yourself.

```py import os import random import subprocess import tempfile from PIL import Image from SSIM_PIL import compare_ssim random.seed(0xDEADBEEF) RE_PATH = 'D:/realesrgan-gui/upscayl-bin.exe' MODEL_PATH = 'D:/realesrgan-gui/models' MODEL_NAME = 'realesrgan-x4plus' def resizeKeepingAspectRatio(img: Image.Image, size: tuple[int | None, int | None]) -> Image.Image: sx, sy = img.size sr = sx / sy dx, dy = size if dx is None and dy is None: raise RuntimeError() elif dx is not None and dy is None: dy = round(dx / sr) elif dx is None and dy is not None: dx = round(dy * sr) return img.resize((dx, dy), Image.Resampling.LANCZOS) if not os.path.exists('source-4k'): os.mkdir('source-4k') for f in os.listdir('source'): with Image.open(os.path.join('source', f)) as img: img.thumbnail((4000, -1), Image.Resampling.LANCZOS) img.save(os.path.splitext(os.path.join('source-4k', f))[0] + '.webp', method=6, lossless=True) if not os.path.exists('source-small'): os.mkdir('source-small') for f in os.listdir('source-4k'): with Image.open(os.path.join('source-4k', f)) as img: img.thumbnail((random.randint(500, 900), -1), Image.Resampling.LANCZOS) img.save(os.path.splitext(os.path.join('source-small', f))[0] + '.webp', lossless=True) if not os.path.exists('test-a'): os.mkdir('test-a') for f in os.listdir('source-small'): upscaledImg = tempfile.mktemp('.webp') with Image.open(os.path.join('source-small', f)) as img: resizeKeepingAspectRatio(img, (1000, None)).save(upscaledImg, lossless=True) subprocess.check_output( ( RE_PATH, '-i', upscaledImg, '-o', os.path.join('test-a', f), '-s', '4', '-m', MODEL_PATH, '-n', MODEL_NAME, '-v', ), ) os.remove(upscaledImg) if not os.path.exists('test-b'): os.mkdir('test-b') for f in os.listdir('source-small'): upscaledImg = tempfile.mktemp('.webp') downscaledImg = tempfile.mktemp('.webp') subprocess.check_output( ( RE_PATH, '-i', os.path.join('source-small', f), '-o', upscaledImg, '-s', '4', '-m', MODEL_PATH, '-n', MODEL_NAME, '-v', ), ) with Image.open(upscaledImg) as img: resizeKeepingAspectRatio(img, (1000, None)).save(downscaledImg, lossless=True) subprocess.check_output( ( RE_PATH, '-i', downscaledImg, '-o', os.path.join('test-b', f), '-s', '4', '-m', MODEL_PATH, '-n', MODEL_NAME, '-v', ), ) os.remove(upscaledImg) os.remove(downscaledImg) print('| Image | Method A | Method B |') print('| :- | :-: | :-: |') for f in os.listdir('source-4k'): with ( Image.open(os.path.join('source-4k', f)) as sourceImg, Image.open(os.path.join('test-a', f)) as testaImg, Image.open(os.path.join('test-b', f)) as testbImg, ): if testaImg.size != sourceImg.size: testaImg = testaImg.resize(sourceImg.size, Image.Resampling.LANCZOS) if testbImg.size != sourceImg.size: testbImg = testbImg.resize(sourceImg.size, Image.Resampling.LANCZOS) testaSSIM = compare_ssim(sourceImg, testaImg) testbSSIM = compare_ssim(sourceImg, testbImg) print(f'| {f} | {testaSSIM} | {testbSSIM} |') ```

the-liquid-metal commented 5 months ago

Thank you for your effort to prove that method A/first route is always better than method B/second route.

Fortunately we don't need accurate comparison represented by numbers. We only need shallow comparison with vision.
Many of our images are imperfect with significant blur and/or noise. Instead of producing near identical image, we need tool that can produce improved image. That's why comparison numbers become irrelevant.

TransparentLC commented 5 months ago

It's hard to convince other people (at least me) with subjective comments like “... looks better”, that's the reason why we need objective image quality metrics like PSNR, SSIM, Butteraugli and so on.

The meaning of the experiment's result is that if there exists a corresponding perfect image, using method A on your imperfect image will be closer to it than method B.

TransparentLC / realesrgan-gui

[FEATURE_REQUEST] option for multiple upscale procedure #80