Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang does not set DAZ flag in -ffast-math mode #13777

Open Quuxplusone opened 12 years ago

Quuxplusone commented 12 years ago
Bugzilla Link PR14024
Status NEW
Importance P enhancement
Reported by Tobias Grosser (tobias@grosser.es)
Reported on 2012-10-04 13:16:28 -0700
Last modified on 2020-08-27 07:22:49 -0700
Version trunk
Hardware PC Linux
CC andrew.kaylor@intel.com, benny.kra@gmail.com, cameron.mcinally@nyu.edu, evandro.menezes@sifive.com, gonzalo.gadeschi@gmail.com, jonathan.sauer@gmx.de, llvm-bugs@lists.llvm.org, Matthew.Arsenault@amd.com, pawel@32bitmicro.com
Fixed by commit(s)
Attachments jacobi_1d.DenormalsAreZero.c (709 bytes, text/x-csrc)
Blocks
Blocked by
See also PR34994
Created attachment 9307
Test case

$gcc jacobi_1d.DenormalsAreZero.c -O3
$time ./a.out
real    0m20.164s

$gcc jacobi_1d.DenormalsAreZero.c -O3 -ffast-math
$time ./a.out
real    0m0.357s

$clang jacobi_1d.DenormalsAreZero.c -O3
$time ./a.out
real    0m36.660s

$clang jacobi_1d.DenormalsAreZero.c -O3 -ffast-math
$time ./a.out
real    0m36.431s

As can be seen the gcc produced binary is a lot faster than clang in -ffast-
mode (besides being a little bit faster in general). This is not caused by
better optimizations, but because gcc links in a small function into the
resulting binary, which sets the DAZ register.

From [1]:

"DAZ tells the CPU to force all Denormals to zero. A Denormal is a number that
is so small that FPU can't renormalize it due to limited exponent ranges.
They're just like normal numbers, but they take considerably longer to process.
Note that not all processors support DAZ."

The test case happens to calculate a lot of these close-to-zero values. Hence,
setting the register has a big impact.

[1] http://softpixel.com/~cwright/programming/simd/sse.php
Quuxplusone commented 12 years ago

Attached jacobi_1d.DenormalsAreZero.c (709 bytes, text/x-csrc): Test case

Quuxplusone commented 12 years ago

This is a neat trick. GCC links in crtfastmath.o (part of libgcc) which sets the necessary bits. We should do the same in the clang driver and provide crtfastmath.o with compiler-rt.

Quuxplusone commented 12 years ago

r165240 makes clang link crtfastmath.o if it's available (only on linux for now).

Quuxplusone commented 9 years ago

The pmmintrin.h header for SSE3 (included in clang) has the macro _MM_SET_DENORMALS_ZERO_MODE that sets DAZ. It doesn't require any SSE3-functionality to do so, only _mm_getcsr and _mm_setcsr, which are part of basic SSE. This might be an alternative when crtfastmath is not available (e.g. on Mac OS X).

Quuxplusone commented 4 years ago

Should FTZ also be set, e.g., -fno-signed-zero is enabled (e.g. via -ffast-math) ?

Quuxplusone commented 4 years ago

Depends on the target, but crtfastmath.o should set both DAZ and FTZ. E.g. x86-64.

NoSignedZeros is orthogonal to DAZ and FTZ. Except maybe for AMDGPU (I think, not sure) and other targets that have special flushing modes.

Also, Andy Kaylor just suggested an LLVM specific way to set these flags at runtime through compiler_rt.

Quuxplusone commented 4 years ago

I believe this should set the right ftz/daz flags since fa7cd549d604bfd8f9dce5d649a19720cbc39cca