Open stolk opened 1 month ago
Thanks Bram, good catch. What do you think about moving the std::pow call down into the ternary operator, to avoid adding the extra std::max call?
Hi Doug,
I advise against the ternary operator...
I checked, and the compilers (gcc, clang) actually produce more efficient code with std::max
than with ?:
See this godbolt output.
...not authorized...
I expected having a CLA in place for OpenImageIO with ASWF, would transfer over to OpenColorIO?
In addition, I think this would fix the unit test errors. (You're clamping the value but the ternary expression needs the unclamped value.)
Oops, did I break the unit test? I'll revisit.
I was not proposing to replace the max with a new ternary. I was suggesting moving the pow call down into the existing ternary. As I mentioned, your unit tests are failing, so some action is needed.
The CLAs for all ASWF projects are managed through the same account but need to be signed separately for each project.
I was not proposing to replace the max with a new ternary. I was suggesting moving the pow call down into the existing ternary. As I mentioned, your unit tests are failing, so some action is needed.
Thanks Doug, I will redo this PR.
Strangely, the unmodified main branch fails the unit test as well. I have to find out what is going on here.
$ make test
Running tests...
Test project /home/stolk/src/OpenColorIO/build
Start 1: test_utils_exec
1/6 Test #1: test_utils_exec .................. Passed 0.01 sec
Start 2: test_cpu
2/6 Test #2: test_cpu .........................***Failed 5.78 sec
Start 3: test_cpu_no_accel
3/6 Test #3: test_cpu_no_accel ................***Failed 5.41 sec
Start 4: test_cpu_avx512
4/6 Test #4: test_cpu_avx512 ..................***Failed 5.99 sec
Start 5: test_gpu
5/6 Test #5: test_gpu .........................***Failed 8.83 sec
Start 6: test_python
6/6 Test #6: test_python ...................... Passed 1.38 sec
33% tests passed, 4 tests failed out of 6
Does the main branch HEAD pass the unit tests for you? Thanks.
All the tests should pass, except for the GPU tests on some mac platforms currently.
I have changed this PR to be less invasive, and only correct the value passed to powf()
and not the entire pixel value as used in the rest of the function.
Note: I cannot get the unit test to pass on my machine, even when using HEAD of main. Pushed the PR to see if CI will pass.
just to note, micro optimisation difference comes from the testing vs return order, if you flip the ternary around you can get the same code ... https://godbolt.org/z/Wzafa3nsb
Can a maintainer please approve the CI run, and then merge? Thanks.
Hi Bram, what do you think about changing this:
out[0] = pixel[0]<=red[3] ? pixel[0] * red[4] : data[0];
to this:
out[0] = pixel[0]<=red[3] ? pixel[0] red[4] : std::pow(pixel[0] red[0] + red[1], red[2]);
That way it avoids the extra branch.
No, that does not address the issue.
We should not be feeding negative numbers to the base of a powf().
My change makes sure of that, your version does not.
Hi Bram,
Is the reason that you don't like my proposal because you are worried that pixel[0] * red[0] + red[1]
will be negative? In practice, that is never the case. In theory, someone may be able to set parameter values that would place the breakpoint below zero, but if that is the edge case you are worried about, it would be better dealt with in the GammaOpData.cpp:GammaOpData::validateParameters function rather than inside the performance critical pixel loop.
This change prevents calling powf(x,y) with negative x.
The SIMD versions using ssePower() already seem to be resistent to negative pixel values, but the scalar version was not.
This would cause a SIGFPE on apps that have floating point exceptions enabled.
FIXES: #2066