I am posting my response to this issue reported on the MLton-user mailing list here as the mailing list appears to be discarding my emails containing this response. This may be useful if the same issue arises in future. Please feel free to close this issue.
I modified David's example by replacing Real.Math.pow with call to my own C function c_pow that calls pow from math.h. This shows exactly the same issue as David's example. Using gdb to inspect the FPU state before each call to pow shows that the PC (Precision Control) bits of the x87 FPU control word differ after libmx is loaded. Before, PC is Extended Precision (64-bits) but after, PC is Double Precision (53-bits) as shown in the following gdb transcript:
(gdb) run
Starting program: ...\test_2.exe
[New Thread 9256.0x6354]
[New Thread 9256.0x2e00]
[New Thread 9256.0x63f8]
Thread 1 hit Breakpoint 1, c_pow (x=9, y=3.5)
at ...\c_pow.c:6
6 res = pow (x, y);
(gdb) info float
R7: Valid 0x00000000000000000000 +0
R6: Valid 0x00000000000000000000 +0
R5: Valid 0x00000000000000000000 +0
R4: Valid 0x00000000000000000000 +0
R3: Valid 0x00000000000000000000 +0
R2: Valid 0x00000000000000000000 +0
R1: Valid 0x00000000000000000000 +0
=>R0: Valid 0x00000000000000000000 +0
Status Word: 0x0000
TOP: 0
Control Word: 0x037f IM DM ZM OM UM PM
PC: Extended Precision (64-bits)
RC: Round to nearest
Tag Word: 0x0000
Instruction Pointer: 0x00:0x00000000
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
(gdb) c
Continuing.
pow(9.0, 3.5) = 2187 = 2187
warning: `C:\Program Files\MATLAB\R2019b\bin\win64\icudt61.dll': Shared library architecture i386 is not compatible with target architecture i386:x86-64.
libmx loaded
Thread 1 hit Breakpoint 1, c_pow (x=9, y=3.5)
at ...\c_pow.c:6
6 res = pow (x, y);
(gdb) info float
R7: Valid 0x400a88b0000000000001 +2187
R6: Valid 0x400a88b0000000000001 +2187
R5: Valid 0x3fff8000000000000000 +1
R4: Valid 0x40028000000000000000 +8
R3: Valid 0x00000000000000000000 +0
R2: Valid 0x00000000000000000000 +0
R1: Valid 0x00000000000000000000 +0
=>R0: Valid 0x00000000000000000000 +0
Status Word: 0x0000
TOP: 0
Control Word: 0x127f IM DM ZM OM UM PM
PC: Double Precision (53-bits)
RC: Round to nearest
Tag Word: 0x0000
Instruction Pointer: 0x00:0x00000000
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
(gdb) c
Continuing.
pow(9.0, 3.5) = 2186.999999999999 = 2186
[Thread 9256.0x63f8 exited with code 0]
[Thread 9256.0x6354 exited with code 0]
[Thread 9256.0x2e00 exited with code 0]
[Inferior 1 (process 9256) exited normally]
The precision control bits affect only the instructions FADD, FSUB, FMUL, FDIV, and their variants, and FSQRT. There is no general x87 'pow' instruction, so the usual approach to implement x ** y is 2 ** (y * log2 x).
Using gdb to step into the pow implementation and disassemble shows a couple of FMUL instructions and an FSQRT. With breakpoints set on these, only one FMUL instruction is reached for the input values we have:
For inputs x = 9.0 and y = 3.5 we have log2 x = 3.169925... so the computaion y * log2 x would be 3.5 * 3.169925....
For the first call to pow we have the following FPU state before and after the FMUL instruction:
(gdb) info float
R7: Valid 0x4000cae00d1cfdeb43d0 +3.169925001442312363
=>R6: Valid 0x4000e000000000000000 +3.5
R5: Valid 0x40028000000000000000 +8
R4: Valid 0x40028000000000000000 +8
R3: Empty 0x00000000000000000000
R2: Valid 0x00000000000000000000 +0
R1: Valid 0x00000000000000000000 +0
R0: Valid 0x00000000000000000000 +0
Status Word: 0xc03020 PE
TOP: 6
Control Word: 0x3020037f IM DM ZM OM UM PM
PC: Extended Precision (64-bits)
RC: Round to nearest
Tag Word: 0x00c0
Instruction Pointer: 0x00:0x64f82535
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
(gdb) si
0x0000000064f8253b 214 in D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/math/x86/pow.def.h
(gdb) info float
=>R7: Valid 0x4002b1840b795e2ddb56 +11.09473750504809327
R6: Valid 0x4000e000000000000000 +3.5
R5: Valid 0x40028000000000000000 +8
R4: Valid 0x40028000000000000000 +8
R3: Special 0x00000000000000000000 +0
R2: Valid 0x00000000000000000000 +0
R1: Valid 0x00000000000000000000 +0
R0: Valid 0x00000000000000000000 +0
Status Word: 0x803820 PE
TOP: 7
Control Word: 0x3820037f IM DM ZM OM UM PM
PC: Extended Precision (64-bits)
RC: Round to nearest
Tag Word: 0x0080
Instruction Pointer: 0x00:0x64f82539
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
For the second call to pow we have the following FPU state before and after the FMUL instruction:
(gdb) info float
R7: Valid 0x4000cae00d1cfdeb43d0 +3.169925001442312363
=>R6: Valid 0x4000e000000000000000 +3.5
R5: Valid 0x40028000000000000000 +8
R4: Valid 0x40028000000000000000 +8
R3: Empty 0x00000000000000000000
R2: Valid 0x00000000000000000000 +0
R1: Valid 0x00000000000000000000 +0
R0: Valid 0x00000000000000000000 +0
Status Word: 0xc03020 PE
TOP: 6
Control Word: 0x3020127f IM DM ZM OM UM PM
PC: Double Precision (53-bits)
RC: Round to nearest
Tag Word: 0x00c0
Instruction Pointer: 0x00:0x64f82535
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
(gdb) si
0x0000000064f8253b 214 in D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/math/x86/pow.def.h
(gdb) info float
=>R7: Valid 0x4002b1840b795e2dd800 +11.09473750504809253
R6: Valid 0x4000e000000000000000 +3.5
R5: Valid 0x40028000000000000000 +8
R4: Valid 0x40028000000000000000 +8
R3: Special 0x00000000000000000000 +0
R2: Valid 0x00000000000000000000 +0
R1: Valid 0x00000000000000000000 +0
R0: Valid 0x00000000000000000000 +0
Status Word: 0x803820 PE
TOP: 7
Control Word: 0x3820127f IM DM ZM OM UM PM
PC: Double Precision (53-bits)
RC: Round to nearest
Tag Word: 0x0080
Instruction Pointer: 0x00:0x64f82539
Operand Pointer: 0x00:0x00000000
Opcode: 0x0000
Before both executions of the FMUL instruction, we have the same inputs:
st(0) = +3.5
st(1) = +3.169925001442312363
(which are the values of y and log2 x).
However, the results differ:
st(0) = +11.09473750504809327 with PC = Extended Precision (64-bits)
st(0) = +11.09473750504809253 with PC = Double Precision (53-bits)
^^^
All other flags in the FPU control word are the same, including RC (Rounding Control) which is Round to nearest in both cases. Note that the FPU control word has 16 bits, even if gdb prints extra bits, and bit 12 - IC (Infinity Control) - is ignored after the 287 coprocessor.
I have updated David's example to restore the FPU control word after libmx is loaded which fixes the floating-point loss of precision. The output is:
pow(9.0, 3.5) = 2187 = 2187
FPU control word = 0000001101111111
libmx loaded
FPU control word = 0001001001111111
pow(9.0, 3.5) = 2186.999999999999 = 2186
FPU control word = 0000001101111111
pow(9.0, 3.5) = 2187 = 2187
The example test_3.zip is attached, which is for building in msys2.
For some additional context, the problematic DLL is LIBMX.DLL from MATLAB R2019b or earlier; MATLAB R2020a or later does not change the PC bit of the FPU control word.
I am posting my response to this issue reported on the MLton-user mailing list here as the mailing list appears to be discarding my emails containing this response. This may be useful if the same issue arises in future. Please feel free to close this issue.
I modified David's example by replacing
Real.Math.pow
with call to my own C functionc_pow
that callspow
frommath.h
. This shows exactly the same issue as David's example. Using gdb to inspect the FPU state before each call topow
shows that the PC (Precision Control) bits of the x87 FPU control word differ after libmx is loaded. Before, PC isExtended Precision (64-bits)
but after, PC isDouble Precision (53-bits)
as shown in the following gdb transcript:The precision control bits affect only the instructions FADD, FSUB, FMUL, FDIV, and their variants, and FSQRT. There is no general x87 'pow' instruction, so the usual approach to implement
x ** y
is2 ** (y * log2 x)
.Using gdb to step into the
pow
implementation and disassemble shows a couple of FMUL instructions and an FSQRT. With breakpoints set on these, only one FMUL instruction is reached for the input values we have:For inputs
x = 9.0
andy = 3.5
we havelog2 x = 3.169925...
so the computaiony * log2 x
would be3.5 * 3.169925...
.For the first call to
pow
we have the following FPU state before and after the FMUL instruction:For the second call to
pow
we have the following FPU state before and after the FMUL instruction:Before both executions of the FMUL instruction, we have the same inputs:
(which are the values of
y
andlog2 x
).However, the results differ:
All other flags in the FPU control word are the same, including RC (Rounding Control) which is Round to nearest in both cases. Note that the FPU control word has 16 bits, even if gdb prints extra bits, and bit 12 - IC (Infinity Control) - is ignored after the 287 coprocessor.
I have updated David's example to restore the FPU control word after libmx is loaded which fixes the floating-point loss of precision. The output is:
The example test_3.zip is attached, which is for building in msys2.