Traverse-Research / ispc-downsampler

Image downsampler using a Lanczos filter implemented in ISPC
Other
11 stars 1 forks source link

Improve the downsampling algorithm #15

Closed KYovchevski closed 7 months ago

KYovchevski commented 2 years ago

While preparing the talk for UU, I noticed that the quality of images made with our downsampler is very low compared to other downsamplers. I took the time to research why that is, it turned out that we aren't taking nearly enough samples when sampling down. For example, sampling from 2048x2048 down to 512x512, we would always take a 6x6 kernel to sample, while other samplers would take 12x12, and adapt that number further depending on the ratio between the source and target dimensions.

I also took inspiration from how other downsamplers handle working with large numbers of samples by caching some of the math for reuse.

The result is a new implementation which preserves image quality much better, but is about twice slower. The performance can probably be improved by splitting the ISPC kernel into two - one for 3 channels and one for 4 channels, and doing the branch in rust instead of relying on function pointers in ISPC. We might be able to squeeze out more performance with cache optimizations, but it'd need further looking into.

The old implementation is kept in both ISPC and Rust, and be invoked using downsample_fast.

KYovchevski commented 2 years ago

I did some testing with what affects the performance for the test case we have (3 channels, 2048x2048 -> 512x512), and have made some interesting observations.

MarijnS95 commented 1 year ago

Since the gist of this PR is improving quality - and if I understand correctly using weight ""caching"" to get there without outlandish times - how does it fare against resize from a quality perspective? main is already slower than resize (45ms vs 37ms) and this PR bumps us to 61ms:

Downsample `square_test.png` using ispc_downsampler                                                                            
                        time:   [61.593 ms 61.710 ms 61.838 ms]
                        change: [+36.286% +36.552% +36.809%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Downsample `square_test.png` using resize                                                                            
                        time:   [37.770 ms 37.803 ms 37.839 ms]
                        change: [-0.2602% -0.1511% -0.0328%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  7 (7.00%) high mild
  12 (12.00%) high severe

EDIT: The win is mostly in debug/dev profiles:

$ cargo bench --profile dev
...
Downsample `square_test.png` using ispc_downsampler                                                                            
                        time:   [108.09 ms 108.13 ms 108.17 ms]
                        change: [+74.846% +75.217% +75.549%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

Benchmarking Downsample `square_test.png` using resize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 120.5s, or reduce sample count to 10.
Downsample `square_test.png` using resize                                                                            
                        time:   [1.1973 s 1.1994 s 1.2013 s]
                        change: [+3066.5% +3072.6% +3078.3%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  8 (8.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  10 (10.00%) high severe