Open Quuxplusone opened 5 years ago
Attached a.c
(2092 bytes, text/x-csrc): Program that demonstrates the issue
Attached without_volitale.s
(8180 bytes, text/plain): Assembly when a.c is compiled without -DKLUDGE
Attached with_volitale.s
(8179 bytes, text/plain): Assembly when a.c is compiled with -DKLUDGE
Changed the "Importance" field to "normal".
A discussion of the bug can be found in the FreeBSD toolchain mailing list archive at
https://lists.freebsd.org/pipermail/freebsd-toolchain/2019-March/004458.html
Myself and Andy Kaylor here at Intel spent a great deal of time playing around with this today using both clang and gcc. I don't believe the precision being set to 53 is necessary to hit the issue. I was able to get the test to report errors on the normal linux configuration.
As I think we mentioned in the freebsd mailing list, we are passing x and y to dp_csinh without rounding to float after the additions. We spill them around the call to csinhf using a 64 bit stack slot. This is due to the X87 codegen treating an fp_extend as nothing more than a copy from float register class to double register class. And the register coalescer rewriting some of the machine IR to use the double register class. This causes spill slot size to be calculated as 64-bits.
Using volatile circumvents this and forces a store as float instead of the spill. This causes the value to be rounded to float before being extended to double.
gcc's codegen seems to be affected by -std=c11 vs -std=gnu11. I believe the difference is really -fexcess-precision=standard vs -fexcess-precision=fast. When -fexcess-precision=fast is in effect, gcc's maximum ULP gets worse if dp_csinh is called before csinhf. Though it does not exceed the limit of 21.
It looks as though -fexcess-precision=standard causes gcc to insert intermediate casts to long double in their intermediate representation. At least on linux. That's what I observed from dumping the output of the 004t.original in godbolt. They might be casts to double on freebsd when the precision is 53 bits. I assume this has some effect on how instructions are generated later.
Another interesting note, I noticed clang does set FLT_EVAL_METHOD to 1 instead of 2 on some versions of NetBSD without SSE. But not for FreeBSD and we don't seem to do anything with the information other than set the define.
(In reply to Craig Topper from comment #5)
> Myself and Andy Kaylor here at Intel spent a great deal of time playing
> around with this today using both clang and gcc. I don't believe the
> precision being set to 53 is necessary to hit the issue. I was able to get
> the test to report errors on the normal linux configuration.
Craig,
Thanks for taking a look at this issue. FreeBSD on i386/387
has been using 53-bit precision for likely more than 2 decades.
To save you some code spelunking, one finds (sorry about long url)
https://svnweb.freebsd.org/base/head/sys/x86/include/fpu.h?revision=314436view=markup
lines 186-208 are
/*
* The hardware default control word for i387's and later coprocessors is
* 0x37F, giving:
*
* round to nearest
* 64-bit precision
* all exceptions masked.
*
* FreeBSD/i386 uses 53 bit precision for things like fadd/fsub/fsqrt etc
* because of the difference between memory and fpu register stack arguments.
* If its using an intermediate fpu register, it has 80/64 bits to work
* with. If it uses memory, it has 64/53 bits to work with. However,
* gcc is aware of this and goes to a fair bit of trouble to make the
* best use of it.
*
* This is mostly academic for AMD64, because the ABI prefers the use
* SSE2 based math. For FreeBSD/amd64, we go with the default settings.
*/
#define __INITIAL_FPUCW__ 0x037F
#define __INITIAL_FPUCW_I386__ 0x127F
#define __INITIAL_NPXCW__ __INITIAL_FPUCW_I386__
#define __INITIAL_MXCSR__ 0x1F80
#define __INITIAL_MXCSR_MASK__ 0xFFBF
The comment above that refers to GCC knowing about this setting can
be traced to
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/freebsd.h?revision=267494&view=markup
lines 120-123 are
/* FreeBSD sets the rounding precision of the FPU to 53 bits. Let the
compiler get the contents of <float.h> and std::numeric_limits correct. */
#undef TARGET_96_ROUND_53_LONG_DOUBLE
#define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)
I've looked through GCC sources, and know the above effects the
processing of gcc/config/i386/i386-modes.def, but that's as far
as I'm able to go at the moment.
--
steve
a.c
(2092 bytes, text/x-csrc)without_volitale.s
(8180 bytes, text/plain)with_volitale.s
(8179 bytes, text/plain)