Closed MorganGrundy closed 4 years ago
CPU generation with CIELAB now fixed: 7d4a328fcbbf238aa3507099b314a7578bdc722b, 8bc7a95df6f40858851586b608d30d0f590d85ca
First commit (7d4a328fcbbf238aa3507099b314a7578bdc722b) seems to have massively improved performance (atleast of RGB Euclidean + CIE76). Using the examples on my website for reference the CPU times are now nearly as good as the CUDA times. Further benchmarks are needed.
CUDA generation with CIELAB now fixed: 67a84085d9318adf9386307aa0c1d897390d3b58
Did more benchmarks. Observations:
Since the CPU generation compare a single image at a time, each cell has a current best fit metric. If during comparing it ever exceeds the current best fit then the rest of the comparison can be skipped.
This optimisation is not used in the CUDA generation and instead we just calculate everything but as much in parallel as we can.
So with RGB Euclidean + CIE76 the CUDA method is not good enough to outperform the CPU method (a more powerful GPU possibly could but I cannot test that).
Images converted to CIELAB colour space still use CV_8U (uchar). They are put into range [0, 255]. They should use float/double with ranges: 0 <= L <= 100, -128 <= a <= 127, -128 <= b <= 127.
Apparently OpenCV scales L by 255/100; translates a and b by 128.
Affect on CIE76 would be more weight given to L (lightness), and less accuracy due to loss of fraction digits. Affect on CIEDE2000 would be much more significant (which would explain why the CIEDE2000 results are not so good).