Make possible to not use fast math

illwieckz commented 2 years ago

See:

https://github.com/Unity-Technologies/crunch/pull/16 by @blaztinn

He said:

Without fast math we get the same output for the same input on all the platforms and architectures.

Otherwise the shasum of output was different on different machines that compressed the same input.

So we may want to disable fast-math, and maybe even pass -nofast-math to be 100% sure some environment variables doesn't introduce it again. The related patch only affects the legacy Makefile so someone has to implement it in CMake as well.

illwieckz commented 5 months ago

Some quotes from:

https://github.com/Unity-Technologies/crunch/pull/16

illwieckz dixit:

If I'm right, some “fast math” options may not only disable some checking, but also make possible to use some dedicated hardware that may even do more precise computation, and while that may not wrong, that may produce images with slight different colors and then checksum.

-ffast-math is more about selectively bypassing IEEE/ISO standard restrictions than it is to not care about precision. -- https://discourse.llvm.org/t/rfc-deprecate-ofast/78687/26

IEEE talks about digital precision. For example, mul + add may not have the same binary answer as mla, so IEEE assumes precision is lost. But it’s often gained. -- https://discourse.llvm.org/t/rfc-deprecate-ofast/78687/30

So, maybe the checksum being different is not the symptom of a bug. But it is probably expected that using fast math breaks reproducibility because then the math functions or even hardware don't have to be conformant to some IEEE standard and even things like level of precision may differ across software/hardware implementations.

For example with the Dæmon game engine we had to update some of our tests when we added an option to disable SSE, because then the x87 compute produced a slightly different result. It was not wrong, just the precision differs, actually SSE had higher precision than x87 so the result was 0.4261826 with SSE but 0.426183 with x87:

Fix trace tests with x87 floating point DaemonEngine/Daemon#1153

Since the tool is meant to produce distributable files, it looks to be a good idea to have build options guaranteeing the reproducibility of the result.

If someone implements a game engine that embeds libcrn to automatically convert PNG and JPG images to DDS/CRN and to store the generated DDS/CRN in a cache, it's probably fine to not care about reproducibility.

But when someone is implementing a toolchain like Urcheon for producing a distributable game with pre-computed DDS/CRN, this one may want to have a knob to enable reproducibility, even if at expense of spending more time at producing the released game.

I think I'll add to Dæmon's crunch a CMake option as a knob to favor reproducibility (and then disable fast math). This option will likely be enabled by default (ffast math disabled by default).

blaztinn dixit:

Yes, that is also the conclusion I got to (with regards to fast math optimizations).

We are using this lib to produce the artifacts at build time and we're caching them by the checksum on some server. For this use-case the fast math being disabled is an appropriate setting.

But I see how it can be beneficial to turn the fast math on if used in an app/game at runtime. I like your approach to using a build flag for it so the user of the lib can decide what to use.

slipher commented 5 months ago

What kind of "reproducibility" are we talking about here?

If you mean that a crunch built with the exact same compiler for the same target platform always produces the same results, I expect that should happen with any floating point options.
If you mean that a crunch built by any compiler for any platform produces the same results, that's probably a bad goal. You'd have to do very slow things like 100% IEEE conformance and -ffloat-store.

It sounds like blaztinn had a very specific use case like "we want to hit the cache most of time when building with any of the 3 versions of the compiler devs in our shop have installed right now" which doesn't generalize to most users.

illwieckz commented 5 months ago

I would prefer if doing the release packages from the same source produce the same packages, be it on Linux on amd64 or on macOS on arm64.

illwieckz commented 5 months ago

Another good example of how fast math may produce different things while not being wrong:

https://stackoverflow.com/questions/6430448/why-doesnt-gcc-optimize-aaaaaa-to-aaaaaa

Q:

What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4", it uses 5 mulsd instructions:
movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
while if I write (a*a*a)*(a*a*a), it will produce
movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm13, %xmm13
which reduces the number of multiply instructions to 3.

A:

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer. As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -fassociative-math option of gcc which allows gcc to reassociate floating point operations, or even the -ffast-math option which allows even more aggressive tradeoffs of accuracy against speed.

It is probably not a big problem to disable fast math to produce a release build of packages, while enabling fast math to produce nightly builds of the same packages.

illwieckz commented 5 months ago

In #51 I make the usage of fast math optional (and also make sure it is also used with MSVC when enabled, unlike before):

https://github.com/DaemonEngine/crunch/pull/51

Now another question is: should this option be enabled by default or not?

Maybe enabling it by default is not bad, people with specific need of reproducibility would look at the available options anyway.

slipher commented 5 months ago

Getting the same results on all systems with SSE instructions might be doable. The GCC default float handling, Visual Studio 2022+ /fp:precise, or Visual Studio (any version) /fp:strict should result in more or less direct translations of the source to SSE. It's the x87 extended precision stuff that makes things really bad.

It is probably not a big problem to disable fast math to produce a release build of packages, while enabling fast math to produce nightly builds of the same packages.

Doesn't sound like a great idea to me. Tests with a testing build become less relevant the more differences there are from a release build.

illwieckz commented 5 months ago

I made the USE_FAST_MATH option introduced in #51 to be enabled by default.

Having the option makes possible for specific needs to disable that easily, while most people would just be happy to benefit from the fastest tool.

DaemonEngine / crunch

Make possible to not use fast math #29