Closed ziriax closed 4 years ago
I personally do not think I can improve the performance of that conversion further - it already uses a faster approximation instead of the pow()
function for gamma-correction.
With the pow()
function it would convert in 1.6 seconds in your case, so it's already much faster than that.
Yes, it is a lot faster than pow already, but wouldn't a lookup table be faster for the 8-bit per channel case? And maybe SSE2 or AVX to process multiple channels at once (although a good compiler might do that vectorization out of the box).
Also, unrelated, do you have an references to papers that can be studied to understand your amazing algorithm?
Look-up table may be faster, but AVIR's design assumes floating point value space, so this is hard to implement via look-up tables. If gamma-correction is mission-critical for you, simply perform gamma-correction with your own code before resizing.
I can't provide any references as AVIR is mostly an original algorithm, I have not based it on any existing papers. The GitHub page already describes all critical elements of the algorithm.
Good point!
At first sight the gamma correction only improves the resampler image slightly for natural photos, so it might not be worth the CPU load, I need to do more tests...
You did an amazing job, I never encountered an image resampler that provides better visual quality, I always believed that the industry had settled with Lanczos plus some sharpening, but your code shows that innovation in this area is still possible.
For testing, I added a template function specialization for the uint8_t
input channel case, using a lookup table, and that speeds up our tests from 650ms to 200ms, that is more than 3 times faster.
Would you accept a PR, or do you feel we should just do this as a preprocessing step?
Since a lookup table results in exactly the same image, this might also be a good optimization for other users.
When disabling sRGB -> linear, we get aliasing (moiree) in some of our test photos (similar like this image), so resampling in the linear domain seems to be a must.
In most cases, moiree happens not due to non-linear value space, but due to orthogonal resizing. Orthogonal resizing is not how things work in nature. EWA resizing is what is needed to reduce diagonal aliasing, but AVIR does not delve there, it's too computationally expensive.
I'm not that much interested in including a special case of LUT for 8-bit resizing as it won't work for 16-bit images anyway.
Resizing in linear space usually results in too much contrast lost - hence you have an impression like moiree is lower.
Thanks for the info, again.
That is interesting, are you suggesting that resizing in sRGB is the more correct thing to do? I guess it's all about trade-offs...
I do not insist that resampling in sRGB is correct, but sRGB->linear->resampling->sRGB destroys image color channel statistics, that's for sure. I do not have a definitive answer as I'm unaware of resampling methods that work in non-linear value space. Probably images should have been always linear in the first place.
If you are researching the topic and come across a suite of optically-resized images with histograms, let me know - I'd like to know if optical resizing retains contrast or not. Digital resizing via sRGB->linear->sRGB conversion clearly does not.
Sorry, I am not a researcher, just a graphics developer.
I tried your other Lanczos based algorithm, and for downsizing, I could not see any visual difference between it and AVIR. Do you have test images that show the advantage of AVIR over LANCIR for downsampling?
Thanks again
You should probably check different resizing factors, there is difference between algorithms, it's not readily apparent though. But of course, Lanczos being so popular is a pretty safe bet in itself.
It seems the conversion sRGB <-> linear takes a large portion of the time spent.
On my i7 7700 (using msvc, full opt, SSE2), resizing our 30MPx sRGB test photo to about 1/12th of its size takes 650ms on average, applying gamma correction. Without gamma correction, it takes 200ms. Approximating the gamma correction with sqrt(2) takes 220ms (but isn't correct of course).
Do you think the performance of this part of your amazing algorithm could be improved?