Open kxxt opened 1 year ago
This is also reproducible on x86_64 platforms with the following steps
CXXFLAGS='-mfma' cmake -GNinja -Bbuild -DOCIO_USE_SSE=OFF
ninja -C build
cd build/tests/cpu
./test_cpu_exec
Failed tests: https://pastebin.pl/view/fd57c16e
I can send a PR to fix the failing tests when FMA is used.
Thank you for investigating this @kxxt. I think we would welcome a PR that fixes the test as long as the tests for our usual compilers don't get significantly worse. So there should be some mechanism for deciding when to use the looser tests. Do you have a suggestion for how to do that?
Thanks for posting the output from the failing tests. It looks like the first one will already be fixed in this pending PR: https://github.com/AcademySoftwareFoundation/OpenColorIO/pull/1687
The following ones in CPUProcessor_tests.cpp look like they could be fixed by allowing a tolerance of +/- 1. However, we should do this in a way that does not remove the exact test for our usual compilers, some sort of "loose mode".
The file format tests could be skipped when in this mode.
The GammaOpCPU_tests.cpp tolerance could be loosened to 2e-7 when in this mode.
Hi @doug-walker ,
So there should be some mechanism for deciding when to use the looser tests.
I don't think there is any reliable way to detect FMA other than compiling a sample program and compare its output value. For gcc, -ffp-contract
is set to fast
by default. The bug doesn't occur on x86_64 because gcc by default targets generic x86_64 that doesn't have FMA instructions.
as long as the tests for our usual compilers don't get significantly worse.
Technically speaking, the tests are incorrect and looser tests are correct. The language standard makes no guarantee that we will get the hard-coded results in the tests. Whether and how expressions are contracted is implementation-defined. [^1]
A floating expression may be contracted, that is, evaluated as though it were an atomic operation, thereby omitting rounding errors implied by the source code and the expression evaluation method. The
FP_CONTRACT
pragma in<math.h>
provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.
[^1]: C99 standard, page 80, https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf
Hi,
Some tests are failing on riscv64 architecture because the results on riscv64 is more precise than other architectures.
For example, test
GammaOpCPU, apply_moncurve_mirror_style_fwd
intests/cpu/ops/gamma/GammaOpCPU_tests.cpp
fails:Detailed output from failed test
```c++ rgba[0] = 0.94786727428436279 0.83333331346511841 0.71428573131561279 0.625 rgba[1] = 0.052132699638605118 0.1666666716337204 0.28571429848670959 0.375 rgba[2] = 2.4000000953674316 2.2000000476837158 2 1.7999999523162842 rgba[3] = 0.039285715669393539 0.1666666716337204 0.40000000596046448 0.75 rgba[4] = 0.077380158007144928 0.44192609190940857 0.8163265585899353 0.98202729225158691 Iteration 0 sign = 1 1 1 1 pixel = 0.00050000002374872565 0.004999999888241291 0.05000000074505806 0.75 repro: pixel[2]=1028443341, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 0.052606634795665741 0.17083333432674408 0.32142859697341919 0.84375 data = 0.00085211289115250111 0.02049555629491806 0.10331634432077408 0.73652046918869019 out = 3.8690079236403108e-05 0.0022096303291618824 0.040816329419612885 0.73652046918869019 Iteration 1 sign = -1 -1 -1 -1 pixel = 0.00050000002374872565 0.004999999888241291 0.05000000074505806 0.75 repro: pixel[2]=1028443341, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 0.052606634795665741 0.17083333432674408 0.32142859697341919 0.84375 data = 0.00085211289115250111 0.02049555629491806 0.10331634432077408 0.73652046918869019 out = -3.8690079236403108e-05 -0.0022096303291618824 -0.040816329419612885 -0.73652046918869019 Iteration 2 sign = 1 1 1 1 pixel = 0.25 0.5 0.75 1 repro: pixel[2]=1061158912, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 0.28909951448440552 0.58333331346511841 0.82142859697341919 1 data = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1 out = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1 Iteration 3 sign = -1 -1 -1 -1 pixel = 0.25 0.5 0.75 1 repro: pixel[2]=1061158912, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 0.28909951448440552 0.58333331346511841 0.82142859697341919 1 data = 0.050876077264547348 0.30550399422645569 0.67474496364593506 1 out = -0.050876077264547348 -0.30550399422645569 -0.67474496364593506 -1 Iteration 4 sign = 1 1 1 1 pixel = 0.80000001192092896 0.94999998807907104 1 1.5 repro: pixel[2]=1065353216, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 0.81042653322219849 0.95833331346511841 1 1.3125 data = 0.60382729768753052 0.91061854362487793 1 1.6314687728881836 out = 0.60382729768753052 0.91061854362487793 1 1.6314687728881836 Iteration 5 sign = -1 -1 -1 -1 pixel = 0.80000001192092896 0.94999998807907104 1 1.5 repro: pixel[2]=1065353216, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 0.81042653322219849 0.95833331346511841 1 1.3125 data = 0.60382729768753052 0.91061854362487793 1 1.6314687728881836 out = -0.60382729768753052 -0.91061854362487793 -1 -1.6314687728881836 Iteration 6 sign = 1 1 1 1 pixel = 1.0049999952316284 1.0499999523162842 1.5 1 repro: pixel[2]=1069547520, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 1.0047392845153809 1.0416666269302368 1.3571429252624512 1 data = 1.0114120244979858 1.0939645767211914 1.8418369293212891 1 out = 1.0114120244979858 1.0939645767211914 1.8418369293212891 1 Iteration 7 sign = -1 -1 -1 -1 pixel = 1.0049999952316284 1.0499999523162842 1.5 1 repro: pixel[2]=1069547520, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = 1.0047392845153809 1.0416666269302368 1.3571429252624512 1 data = 1.0114120244979858 1.0939645767211914 1.8418369293212891 1 out = -1.0114120244979858 -1.0939645767211914 -1.8418369293212891 -1 Iteration 8 sign = -1 1 1 1 pixel = inf inf nan 0 repro: pixel[2]=2143289344, blu[0] = 1060559726, blu[1] = 1049774373 intermediate = inf inf nan 0.375 data = inf inf nan 0.17110247910022736 out = -inf inf nan 0 Result[0] = 3.8689999200869352e-05 Image [0] = 3.8690079236403108e-05 Result[1] = 0.0022096300963312387 Image [1] = 0.0022096303291618824 Result[2] = 0.040816318243741989 Image [2] = 0.040816329419612885 Result[3] = 0.73652046918869019 Image [3] = 0.73652046918869019 Result[4] = -3.8689999200869352e-05 Image [4] = -3.8690079236403108e-05 Result[5] = -0.0022096300963312387 Image [5] = -0.0022096303291618824 Result[6] = -0.040816318243741989 Image [6] = -0.040816329419612885 Result[7] = -0.73652046918869019 Image [7] = -0.73652046918869019 Result[8] = 0.050876069813966751 Image [8] = 0.050876077264547348 Result[9] = 0.30550399422645569 Image [9] = 0.30550399422645569 Result[10] = 0.67474484443664551 Image [10] = 0.67474496364593506 /usr/src/debug/opencolorio/OpenColorIO-2.2.1/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:710: FAILED: Index: 10 - Values: 0.674744964 and: 0.674744844 - Threshold: 1.00000001e-07 Result[11] = 1 Image [11] = 1 Result[12] = -0.050876069813966751 Image [12] = -0.050876077264547348 Result[13] = -0.30550399422645569 Image [13] = -0.30550399422645569 Result[14] = -0.67474484443664551 Image [14] = -0.67474496364593506 /usr/src/debug/opencolorio/OpenColorIO-2.2.1/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:710: FAILED: Index: 14 - Values: -0.674744964 and: -0.674744844 - Threshold: 1.00000001e-07 Result[15] = -1 Image [15] = -1 Result[16] = 0.60382729768753052 Image [16] = 0.60382729768753052 Result[17] = 0.91061854362487793 Image [17] = 0.91061854362487793 Result[18] = 1 Image [18] = 1 Result[19] = 1.6314687728881836 Image [19] = 1.6314687728881836 Result[20] = -0.60382729768753052 Image [20] = -0.60382729768753052 Result[21] = -0.91061854362487793 Image [21] = -0.91061854362487793 Result[22] = -1 Image [22] = -1 Result[23] = -1.6314687728881836 Image [23] = -1.6314687728881836 Result[24] = 1.0114120244979858 Image [24] = 1.0114120244979858 Result[25] = 1.0939645767211914 Image [25] = 1.0939645767211914 Result[26] = 1.8418365716934204 Image [26] = 1.8418369293212891 /usr/src/debug/opencolorio/OpenColorIO-2.2.1/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:710: FAILED: Index: 26 - Values: 1.84183693 and: 1.84183657 - Threshold: 1.00000001e-07 Result[27] = 1 Image [27] = 1 Result[28] = -1.0114120244979858 Image [28] = -1.0114120244979858 Result[29] = -1.0939645767211914 Image [29] = -1.0939645767211914 Result[30] = -1.8418365716934204 Image [30] = -1.8418369293212891 /usr/src/debug/opencolorio/OpenColorIO-2.2.1/tests/cpu/ops/gamma/GammaOpCPU_tests.cpp:710: FAILED: Index: 30 - Values: -1.84183693 and: -1.84183657 - Threshold: 1.00000001e-07 Result[31] = -1 Image [31] = -1 Result[32] = -inf Image [32] = -inf Result[33] = inf Image [33] = inf Result[34] = nan Image [34] = nan Result[35] = 0 Image [35] = 0 ```Note that I modified
src/OpenColorIO/ops/gamma/GammaOpCPU.cpp
andtests/cpu/ops/gamma/GammaOpCPU.cpp
to get more outputs.Modified source
```c++ void GammaMoncurveMirrorOpCPUFwd::apply(const void* inImg, void* outImg, long numPixels) const { const float* in = (const float*)inImg; float* out = (float*)outImg; const float red[5] = { m_red.scale, m_red.offset, m_red.gamma, m_red.breakPnt, m_red.slope }; const float grn[5] = { m_green.scale, m_green.offset, m_green.gamma, m_green.breakPnt, m_green.slope }; const float blu[5] = { m_blue.scale, m_blue.offset, m_blue.gamma, m_blue.breakPnt, m_blue.slope }; const float alp[5] = { m_alpha.scale, m_alpha.offset, m_alpha.gamma, m_alpha.breakPnt, m_alpha.slope }; std::cout.precision(17); for (int i = 0; i < 5; i++) { std::cout << "rgba[" << i << "] = " << red[i] << " " << grn[i] << " " << blu[i] << " " << alp[i] << std::endl; } for (long idx = 0; idx < numPixels; ++idx) { std::cout << "Iteration " << idx << std::endl; const float sign[4] = { std::copysign(1.0f, in[0]), std::copysign(1.0f, in[1]), std::copysign(1.0f, in[2]), std::copysign(1.0f, in[3]) }; std::cout << "sign = " << sign[0] << " " << sign[1] << " " << sign[2] << " " << sign[3] << std::endl; const float pixel[4] = { std::fabs(in[0]), std::fabs(in[1]), std::fabs(in[2]), std::fabs(in[3]) }; std::cout << "pixel = " << pixel[0] << " " << pixel[1] << " " << pixel[2] << " " << pixel[3] << std::endl; std::cout << "repro: pixel[2]=" << std::hex << *(int*)(&pixel[2]) << ", blu[0] = " << *(int*)(&blu[0]) << ", blu[1] = " << *(int*)(&blu[1]) << std::endl; const float intermediate[4] = { pixel[0] * red[0] + red[1], pixel[1] * grn[0] + grn[1], pixel[2] * blu[0] + blu[1], pixel[3] * alp[0] + alp[1] }; std::cout << "intermediate = " << intermediate[0] << " " << intermediate[1] << " " << intermediate[2] << " " << intermediate[3] << std::endl; const float data[4] = { std::pow(intermediate[0], red[2]), std::pow(intermediate[1], grn[2]), std::pow(intermediate[2], blu[2]), std::pow(intermediate[3], alp[2]) }; std::cout << "data = " << data[0] << " " << data[1] << " " << data[2] << " " << data[3] << std::endl; out[0] = sign[0] * (pixel[0] <= red[3] ? pixel[0] * red[4] : data[0]); out[1] = sign[1] * (pixel[1] <= grn[3] ? pixel[1] * grn[4] : data[1]); out[2] = sign[2] * (pixel[2] <= blu[3] ? pixel[2] * blu[4] : data[2]); out[3] = sign[3] * (pixel[3] <= alp[3] ? pixel[3] * alp[4] : data[3]); std::cout << "out = " << out[0] << " " << out[1] << " " << out[2] << " " << out[3] << std::endl; in += 4; out += 4; } ``` ```c++ void ApplyGamma(const OCIO::OpRcPtr & op, float * image, const float * result, long numPixels, unsigned line, float errorThreshold) { std::cout.precision(10); const auto cpu = op->getCPUOp(true); OCIO_CHECK_NO_THROW_FROM(cpu->apply(image, image, numPixels), line); for(long idx=0; idx<(numPixels*4); ++idx) { std::cout << "Result[" << idx << "] = " << std::setw(16) << result[idx] << std::endl; std::cout << "Image [" << idx << "] = " << std::setw(16) << image[idx] << std::endl; if (OCIO::IsNan(result[idx])) { OCIO_CHECK_ASSERT_FROM(OCIO::IsNan(image[idx]), line); continue; } // Using rel error with a large minExpected value of 1 will transition // from absolute error for expected values < 1 and // relative error for values > 1. const bool equalRel = OCIO::EqualWithSafeRelError(image[idx], result[idx], errorThreshold, 1.0f); if (!equalRel) { // As most of the error thresholds are 1e-7f, the output // value precision should then be bigger than 7 digits // to highlight small differences. std::ostringstream message; message.precision(9); message << "Index: " << idx << " - Values: " << image[idx] << " and: " << result[idx] << " - Threshold: " << errorThreshold; OCIO_CHECK_ASSERT_MESSAGE_FROM(0, message.str(), line); } } } ```Let's take a detailed look at the Iteration 2 from the log:
On x86_64 with SSE2 disabled, it looks like:
The difference of the 3rd element(at index 2) causes a test failure later.
After some investigation, I found that the difference is caused by FMA(Fused Multiply-Add).
The following expression
can be computed with two steps:
It is equivalent to
round(round(a * b) + c)
, which is the behavior on x86_64.However, with the current build config in this repo, gcc uses
fmadd.s
to compute the expr.fmadd.s <result_register> <a> <b> <c>
computesa * b + c
asround(a * b + c)
, which is of higher precision thanround(round(a * b) + c)
. And this difference finally caused the test failure.There are several ways to fix this and I would like to ask for your opinions.
-ffp-contract=off
to prevent the compiler from generating FMA instructions. Apparently this is not recommended because we will lose the benefits of the higher precision we get for free from FMA.