Traverse-Research / ispc-downsampler

Image downsampler using a Lanczos filter implemented in ISPC
Other
11 stars 1 forks source link

Revert "lanczos3: Actually sample 7x7 instead of 6x6 (#27)" #31

Closed MarijnS95 closed 1 year ago

MarijnS95 commented 1 year ago

This reverts commit 22e1bb1e734a66a1c577f0fa7bae02303c77ff2a.

The kernel size of the lanczos3 filter is 6x6, and sampling it at x=3.5 or y=3.5 results in a weight of 0, thus making these pixels completely irrelevant. This became more clear in #28 that simplified the offset passed to lanczos3_filter() to always be 0.5, to read the weight at the middle of each source pixel.

Note that for an even reduction in image size the center coordinate of every target pixel (what uv denotes) is exactly on the boundary between two source pixels, meaning the pixel at kernel position x=0,y=0 (barring float imprecisions) is at the right/bottom of the center of the target pixel, hence correctly reading 3 pixels to the left, top, right and bottom (with indices in the range [-3, 2]).

For uneven reductions (i.e. 3x) this doesn't hold, and that was likely what the code removed in #28 was incorrectly trying to compensate for?

MarijnS95 commented 1 year ago

For uneven reductions (i.e. 3x) this doesn't hold, and that was likely what the code removed in #28 was incorrectly trying to compensate for?

@KYovchevski can you shed some light on this? Perhaps we should add part of that code back to make it work again?

MarijnS95 commented 1 year ago

The kernel size of the lanczos3 filter is 6x6, and sampling it at x=3.5 or y=3.5 results in a weight of 0, thus making these pixels completely irrelevant

Quite funky that reading 7x7 - 6x6 = 13 more pixels per iteration (and weighting them with 0) has no effect on cargo bench timings.

MarijnS95 commented 1 year ago

We should really merge this. Before:

Downsample `square_test.png` using ispc_downsampler
                    time:   [54.067 ms 54.110 ms 54.154 ms]

After:

Downsample `square_test.png` using ispc_downsampler
                    time:   [43.947 ms 44.037 ms 44.175 ms]
                    change: [-18.798% -18.615% -18.374%] (p = 0.00 < 0.05)
                    Performance has improved.