MLton / mlton

The MLton repository
http://mlton.org
Other
962 stars 127 forks source link

Result of Real.Math.pow() function for same inputs changes after loading a DLL #540

Closed pclayton closed 10 months ago

pclayton commented 11 months ago

I am posting my response to this issue reported on the MLton-user mailing list here as the mailing list appears to be discarding my emails containing this response. This may be useful if the same issue arises in future. Please feel free to close this issue.

I modified David's example by replacing Real.Math.pow with call to my own C function c_pow that calls pow from math.h. This shows exactly the same issue as David's example. Using gdb to inspect the FPU state before each call to pow shows that the PC (Precision Control) bits of the x87 FPU control word differ after libmx is loaded. Before, PC is Extended Precision (64-bits) but after, PC is Double Precision (53-bits) as shown in the following gdb transcript:

(gdb) run
Starting program: ...\test_2.exe
[New Thread 9256.0x6354]
[New Thread 9256.0x2e00]
[New Thread 9256.0x63f8]

Thread 1 hit Breakpoint 1, c_pow (x=9, y=3.5)
    at ...\c_pow.c:6
6           res = pow (x, y);
(gdb) info float
  R7: Valid   0x00000000000000000000 +0
  R6: Valid   0x00000000000000000000 +0
  R5: Valid   0x00000000000000000000 +0
  R4: Valid   0x00000000000000000000 +0
  R3: Valid   0x00000000000000000000 +0
  R2: Valid   0x00000000000000000000 +0
  R1: Valid   0x00000000000000000000 +0
=>R0: Valid   0x00000000000000000000 +0

Status Word:         0x0000
                       TOP: 0
Control Word:        0x037f   IM DM ZM OM UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x0000
Instruction Pointer: 0x00:0x00000000
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000
(gdb) c
Continuing.
pow(9.0, 3.5) = 2187 = 2187
warning: `C:\Program Files\MATLAB\R2019b\bin\win64\icudt61.dll': Shared library architecture i386 is not compatible with target architecture i386:x86-64.
libmx loaded

Thread 1 hit Breakpoint 1, c_pow (x=9, y=3.5)
    at ...\c_pow.c:6
6           res = pow (x, y);
(gdb) info float
  R7: Valid   0x400a88b0000000000001 +2187
  R6: Valid   0x400a88b0000000000001 +2187
  R5: Valid   0x3fff8000000000000000 +1
  R4: Valid   0x40028000000000000000 +8
  R3: Valid   0x00000000000000000000 +0
  R2: Valid   0x00000000000000000000 +0
  R1: Valid   0x00000000000000000000 +0
=>R0: Valid   0x00000000000000000000 +0

Status Word:         0x0000
                       TOP: 0
Control Word:        0x127f   IM DM ZM OM UM PM
                       PC: Double Precision (53-bits)
                       RC: Round to nearest
Tag Word:            0x0000
Instruction Pointer: 0x00:0x00000000
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000
(gdb) c
Continuing.
pow(9.0, 3.5) = 2186.999999999999 = 2186
[Thread 9256.0x63f8 exited with code 0]
[Thread 9256.0x6354 exited with code 0]
[Thread 9256.0x2e00 exited with code 0]
[Inferior 1 (process 9256) exited normally]

The precision control bits affect only the instructions FADD, FSUB, FMUL, FDIV, and their variants, and FSQRT. There is no general x87 'pow' instruction, so the usual approach to implement x ** y is 2 ** (y * log2 x).

Using gdb to step into the pow implementation and disassemble shows a couple of FMUL instructions and an FSQRT. With breakpoints set on these, only one FMUL instruction is reached for the input values we have:

   0x0000000064f82519 <+937>:   fldl   0x38(%rsp)
   0x0000000064f8251d <+941>:   fstpt  0x40(%rsp)
   0x0000000064f82521 <+945>:   callq  0x64f82be0 <log2l>
   0x0000000064f82526 <+950>:   fldt   0x50(%rsp)
   0x0000000064f8252a <+954>:   mov    %r13,%rdx
   0x0000000064f8252d <+957>:   mov    %r12,%rcx
   0x0000000064f82530 <+960>:   mov    %rbx,0x38(%rsp)
   0x0000000064f82535 <+965>:   fldl   0x38(%rsp)
=> 0x0000000064f82539 <+969>:   fmulp  %st,%st(1)
   0x0000000064f8253b <+971>:   fstpt  0x40(%rsp)
   0x0000000064f8253f <+975>:   callq  0x64f82a80 <exp2l>
   0x0000000064f82544 <+980>:   fldt   0x50(%rsp)
   0x0000000064f82548 <+984>:   fstpl  0x38(%rsp)
   0x0000000064f8254c <+988>:   movsd  0x38(%rsp),%xmm6

For inputs x = 9.0 and y = 3.5 we have log2 x = 3.169925... so the computaion y * log2 x would be 3.5 * 3.169925....

For the first call to pow we have the following FPU state before and after the FMUL instruction:

(gdb) info float
  R7: Valid   0x4000cae00d1cfdeb43d0 +3.169925001442312363
=>R6: Valid   0x4000e000000000000000 +3.5
  R5: Valid   0x40028000000000000000 +8
  R4: Valid   0x40028000000000000000 +8
  R3: Empty   0x00000000000000000000
  R2: Valid   0x00000000000000000000 +0
  R1: Valid   0x00000000000000000000 +0
  R0: Valid   0x00000000000000000000 +0

Status Word:         0xc03020                  PE
                       TOP: 6
Control Word:        0x3020037f   IM DM ZM OM UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x00c0
Instruction Pointer: 0x00:0x64f82535
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000
(gdb) si
0x0000000064f8253b      214     in D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/math/x86/pow.def.h
(gdb) info float
=>R7: Valid   0x4002b1840b795e2ddb56 +11.09473750504809327
  R6: Valid   0x4000e000000000000000 +3.5
  R5: Valid   0x40028000000000000000 +8
  R4: Valid   0x40028000000000000000 +8
  R3: Special 0x00000000000000000000 +0
  R2: Valid   0x00000000000000000000 +0
  R1: Valid   0x00000000000000000000 +0
  R0: Valid   0x00000000000000000000 +0

Status Word:         0x803820                  PE
                       TOP: 7
Control Word:        0x3820037f   IM DM ZM OM UM PM
                       PC: Extended Precision (64-bits)
                       RC: Round to nearest
Tag Word:            0x0080
Instruction Pointer: 0x00:0x64f82539
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000

For the second call to pow we have the following FPU state before and after the FMUL instruction:

(gdb) info float
  R7: Valid   0x4000cae00d1cfdeb43d0 +3.169925001442312363
=>R6: Valid   0x4000e000000000000000 +3.5
  R5: Valid   0x40028000000000000000 +8
  R4: Valid   0x40028000000000000000 +8
  R3: Empty   0x00000000000000000000
  R2: Valid   0x00000000000000000000 +0
  R1: Valid   0x00000000000000000000 +0
  R0: Valid   0x00000000000000000000 +0

Status Word:         0xc03020                  PE
                       TOP: 6
Control Word:        0x3020127f   IM DM ZM OM UM PM
                       PC: Double Precision (53-bits)
                       RC: Round to nearest
Tag Word:            0x00c0
Instruction Pointer: 0x00:0x64f82535
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000
(gdb) si
0x0000000064f8253b      214     in D:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/math/x86/pow.def.h
(gdb) info float
=>R7: Valid   0x4002b1840b795e2dd800 +11.09473750504809253
  R6: Valid   0x4000e000000000000000 +3.5
  R5: Valid   0x40028000000000000000 +8
  R4: Valid   0x40028000000000000000 +8
  R3: Special 0x00000000000000000000 +0
  R2: Valid   0x00000000000000000000 +0
  R1: Valid   0x00000000000000000000 +0
  R0: Valid   0x00000000000000000000 +0

Status Word:         0x803820                  PE
                       TOP: 7
Control Word:        0x3820127f   IM DM ZM OM UM PM
                       PC: Double Precision (53-bits)
                       RC: Round to nearest
Tag Word:            0x0080
Instruction Pointer: 0x00:0x64f82539
Operand Pointer:     0x00:0x00000000
Opcode:              0x0000

Before both executions of the FMUL instruction, we have the same inputs:

  st(0) = +3.5
  st(1) = +3.169925001442312363

(which are the values of y and log2 x).

However, the results differ:

  st(0) = +11.09473750504809327  with PC = Extended Precision (64-bits)
  st(0) = +11.09473750504809253  with PC = Double Precision (53-bits)
                            ^^^

All other flags in the FPU control word are the same, including RC (Rounding Control) which is Round to nearest in both cases. Note that the FPU control word has 16 bits, even if gdb prints extra bits, and bit 12 - IC (Infinity Control) - is ignored after the 287 coprocessor.

I have updated David's example to restore the FPU control word after libmx is loaded which fixes the floating-point loss of precision. The output is:

pow(9.0, 3.5) = 2187 = 2187
FPU control word = 0000001101111111
libmx loaded
FPU control word = 0001001001111111
pow(9.0, 3.5) = 2186.999999999999 = 2186
FPU control word = 0000001101111111
pow(9.0, 3.5) = 2187 = 2187

The example test_3.zip is attached, which is for building in msys2.

MatthewFluet commented 11 months ago

Thank you for the thorough analysis!

For some additional context, the problematic DLL is LIBMX.DLL from MATLAB R2019b or earlier; MATLAB R2020a or later does not change the PC bit of the FPU control word.