Closed the-liquid-metal closed 5 months ago
A fundamental question: does realesrgan-gui perform multiple upscales if the "scale to width" or "scale to height" options have values greater than the pixels produced by the first upscale model, or simply run single upscale and ignoring the value?
I don't know what "upscale it without AI" means.
Multiple upscales is performed if upscale once doesn't reach the specified resolution. If you upscale a 900px image to 4k with a 4x model, the GUI will run Real-ESRGAN for twice: upscale 900px to 3600px, then to 14400px, then downsample to 4k. This is what waifu2x-caffe does, as you can notice it from upscaling time: ... = 1.99x = 2.00x < 2.01x = ... =3.99x = 4.00x < 4.01x = ...
You may think that upscaling from 3600px to 14400px with Real-ESRGAN is a waste in this case, but I think it is more like a compromise. Upscaling from 3600px to 4k with lanczos or bicubic or something else without running Real-ESRGAN makes no sense.
I don't know what "upscale it without AI" means.
It means simple resizing with classical algorithms to the nearest larger dimension.
Suppose that i want my picture to have 4096px. My current picture is 900px. So, the nearest larger dimension is 1024px (4096÷4). Before i do upscale with 4x model, i do resize my image from 900px to 1024px.
This will save a lot of time.
This also means no more downsampling if "upscale without AI first, followed by AI" is selected.
FYI, XnConvert (or any similar app) only takes a few seconds to resize (lanczos) with dozens of images.
I accept and plan to implement this feature.
Document how it should be implemented here. Imagining if you are upscaling from 1000px to 16001px (not divisible by multiples of 4) with a 4x model:
math.log(16001 / 1000, 4) = 2.00005
, so Real-ESRGAN need to be run twice (4 ** 2 = 16
x upscale). I set a threshold here, the pre-upscale is only enabled if the fractional part is lower than 0.5.math.ceil(16001 / (4 ** 2)) = 1001
O my goodness, this turns out to be a bit more complicated than I imagined.
You can try this feature from the latest build from actions now.
Quick and dirty test Input: a photoset with 39 files, landscape and portrait orientation, has 700 x 465, scale to 3500 width/height. Result:
I use XnConvert to resize pre-upscale but unfortunately i didn't know its compression level (i guess 80%). Can you tell me what percentage is applied for pre-upscale? Is it picked from "lossy compression quality" field?
To find out whether the compression level is correct or not (80%), i applied Caesium to both outputs. and the result: i gain 5-7% smaller file size from the latest build and 82-87% from manual prosedure. It means the latest build did a compression but not 80% (as displayed in the "lossy compression quality" field).
That's all I can do at the moment. I will conduct more comprehensive test latter.
real-esrgan-ncnn-vulkan uses libwebp and Windows Imaging Component (only on Windows) / stb_image to save images. The webp comes from real-esrgan-ncnn-vulkan is lossless. For jpg it uses the best quality the encoder supports (1.0 for WIC and 100 for stb_image).
The GUI's lossy compression only works on the final output if you enabled the option and the output extension is jpg or webp. In this case the GUI will let real-esrgan-ncnn-vulkan outputs a lossless webp, then compress by itself. I enabled maximum compression options (method 6 for webp, optimize and progressive for jpg) in this procedure. If the GUI's lossy compression is not enabled then the output of real-esrgan-ncnn-vulkan will be used directly.
Pre-upscale outputs a lossless png/webp. The compression level of this intermediate file should have no effect on the compression level of the final output.
I have informed this app to my colleagues. Based on our discussion, We have some notes related to multiple upscale. My colleague found an alternative route and he showed me that this route was worth doing. The final output is slightly better than the first route (upsize → upscale), but not better than the original route (upscale → upscale → downsize).
Rather than immediately resizing to ¼X of the required final dimensions, we do upscale first and then resizing to ¼X of the required final dimensions. And the last step is still upscale 4x. so this route is: upscale → downsize → upscale.
This seems odd, but it is the best tradeoff between speed and quality. My previous test takes time roughly 15 minutes for original route, and 1,5 minutes for first route. The upsize step only takes time a fraction of second. So, if we do second route, it will be less than 2x required time for first route, which is less than 3 minutes. It is still save a lot of time.
Why is it less than 2x required time? Because dimension of 1st upscale (U1) is lesser then 2nd upscale (U2). The first route is actually doing U2, not U1.
We do first route if the dimensional differences is less than 10%, the second route for others. It is reasonable to keep these two routes exist, to optimize time. In our humble opinion, doing second route for every dimensional differences is not necessary, since we will not get significant quality improvement.
At the moment, realesrgan-gui already has "Try to pre-upscale with general algorithm" option. It does not need to be changed. The existence of the first and second routers does not need to be exposed to user, and there is no interaction between user and these two routers. The app is automatically select which router will be used. This requires a little additional calculation (hopefully 😄). After this procedure revision, "Try to pre-upscale with general algorithm" is slightly misleading, since upscaling is also involved in pre-upscale step. IMO, it should be changed to "Try to pre-upscale first".
The final output is slightly better ...
Really? Let's do an experiment ...
I randomly select 20 images from Pixabay and resized the width to 4000px as the source image. Then resize to a random size between 500-900px, finally upscale to 4000px with two methods:
Measure the upscaling quality by SSIM (the range is 0-1, 1 means that the two images are identical) compared with the source image. And here is the result:
Image | Method A | Method B |
---|---|---|
bird-8724916.webp | 0.8383716241128215 | 0.8084521998955357 |
cape-robin-chat-3457709.webp | 0.6689714033967844 | 0.6333938842469079 |
chevrolet-8647804.webp | 0.8739473732596035 | 0.8380585663438305 |
coffee-8684315.webp | 0.8954174533895213 | 0.8586715671519265 |
flamingo-8348527.webp | 0.8703230315343682 | 0.8341921357903818 |
flowers-882828.webp | 0.9220320752375305 | 0.8935422038970172 |
gem-3190526.webp | 0.9094632791085002 | 0.8819865801582157 |
hamburg-8573427.webp | 0.7909050157587569 | 0.7764784655858473 |
hinduism-8464313.webp | 0.8608063932901557 | 0.8082481234285294 |
lake-4839058.webp | 0.84792627627998 | 0.8186624343257443 |
lake-7624330.webp | 0.713461903930544 | 0.6743180925413687 |
lavender-8075280.webp | 0.9203500941541677 | 0.9038760834624219 |
nature-3112997.webp | 0.8117017197500049 | 0.7859409977430579 |
ocean-8408693.webp | 0.7318064691344405 | 0.6701012540354522 |
pink-algae-5389441.webp | 0.5377135373459919 | 0.4536347436015608 |
sand-7468945.webp | 0.563362584326837 | 0.5353643477371843 |
sea-2755858.webp | 0.6562101044241124 | 0.6117211383355793 |
straubing-8669480.webp | 0.7818840830184042 | 0.7591033255753219 |
stubble-8142239.webp | 0.8300588379881402 | 0.7646237076381848 |
woman-8619512.webp | 0.9310556298676326 | 0.9130578385451381 |
Method A always performs better than Method B. So I'm not going to change the pre-upscale procedure.
I wrote the following script for testing. You can download those images from Pixabay (use links like https://pixabay.com/photos/bird-8724916/) or use other images and test by yourself.
Thank you for your effort to prove that method A/first route is always better than method B/second route.
It's hard to convince other people (at least me) with subjective comments like “... looks better”, that's the reason why we need objective image quality metrics like PSNR, SSIM, Butteraugli and so on.
The meaning of the experiment's result is that if there exists a corresponding perfect image, using method A on your imperfect image will be closer to it than method B.
It is a madness situation for users who have to do upscale multiple times if their hardware has limited spec. If a single upscale on a single image needs 3 minutes to finish, how much time to finish consecutive upscale? n^2?. Even worse, Real-ESRGAN does not provide "realesrgan-x2plus" and "realesrgan-x3plus" models.
I've experienced this, when upscaling 900px or less to 4K. The lessons I learned: i have to upscale it without AI first, and then with AI.
I propose that realesrgan-gui provides additional option as select widget with items:
This new option is only relevant with: "scale to width" or "scale to height" options, and only applied if those scales is greater than pixels produced by first upscale.
There should be a visible note for this new option.