JuliaMath / openlibm

High quality system independent, portable, open source libm implementation
https://openlibm.org
Other
523 stars 140 forks source link

fmaf is sometimes incorrect #160

Open ikirill opened 7 years ago

ikirill commented 7 years ago

This example was pointed out by njuffa on scicomp.stackexchange.com. I checked it also with a recent version of Z3 and exact rational arithmetic.

# This is the native cpu instruction
julia> fma(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0

julia> Base.fma_libm(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.2432504f0

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
yuyichao commented 7 years ago

FWIW, glibc fmaf got this right. (And it also uses hardware instructions when available...)

ikirill commented 7 years ago

For reference, this is what glibc seems to be doing, if I translated the code right (in works in my tests):


function get_FE_INEXACT()
  # https://github.com/JuliaLang/julia/blob/2e6d908cea8334e875ea4d22d771a6eab0eef8e7/src/jlapi.c#L413
  u = zeros(Cint, 9) + typemin(Cint)
  ccall(:jl_get_fenv_consts, Void, (Ptr{Cint},), u)
  myid() > 1 || println(STDERR, "jl_get_fenv_consts: ", u)
  u[1]
end

const FE_INEXACT = get_FE_INEXACT()

fetestexcept(excepts) = ccall((:fetestexcept, :libc), Cint, (Cint,), excepts)
feclearexcept(excepts) = ccall((:feclearexcept, :libc), Cint, (Cint,), excepts) == 0 || error("feclearexcept")
feinexact() = fetestexcept(FE_INEXACT) != 0

function glibc_fmaf(x::Float32, y::Float32, z::Float32)
  temp = Float64(x) * Float64(y)
  if temp == Float64(-z)
    return Float32(temp) + z
  end

  setrounding(Float64, RoundToZero)
  feclearexcept(FE_INEXACT)
  u = temp + Float64(z)
  inexact = feinexact()
  setrounding(Float64, RoundNearest)
  w = reinterpret(UInt64, u)
  if inexact && (w & 0x1) == 0 && ((w & 0x7ff0_0000_0000_0000) >> 52) != 0x7ff
    Float32(reinterpret(Float64, w | 0x1))
  else
    Float32(u)
  end
end
ikirill commented 7 years ago

One idea I had is that the result - xy == z is not actually a correct way to test whether the sum xy + z was computed exactly, and replace the test with computing the correct error using the double-double 2sum algorithm and checking that the known correct error is zero instead. This is different from what glibc does, which seems based on a different concept entirely (but maybe not, on further thought I'm not sure; maybe all I'm doing is emulating the FE_INEXACT flag). Then it no longer seems to fail (I tested it on 2^35 random values, which is good enough to trigger the original bug):

# Hopefully this causes it to correctly round to zero
@noinline function add_rtz(x::Float64, y::Float64)
  ccall(:fesetround, Cint, (Cint,), Base.Rounding.JL_FE_TOWARDZERO)
  w = x + y
  ccall(:fesetround, Cint, (Cint,), Base.Rounding.JL_FE_TONEAREST)
  return w
end

# This is a translation of openlibm's fmaf
function fma_libm_modified(x::Float32, y::Float32, z::Float32)
  xy = Float64(x) * Float64(y)

  # The original openlibm code:
  # result = xy + Float64(z)
  # exact = result - xy == z

  # A correct test for whether xy+z was computed exactly
  # This is the double-double 2sum algorithm
  result = xy + Float64(z)
  temp = result - xy
  # order of parens is important:
  result_err = (xy - (result - temp)) + (Float64(z) - temp)
  exact = result_err == 0.0

  w = reinterpret(UInt64, result)

  if (w & 0x1fffffff) != 0x10000000 || (w & 0x7ff0000000000000) == 0x7ff0000000000000 || exact
    return Float32(result)
  else
    adjusted_result = add_rtz(xy, Float64(z))
    if result == adjusted_result
      return Float32(reinterpret(Float64, w + 1))
    else
      return Float32(adjusted_result)
    end
  end
end
simonbyrne commented 7 years ago

We can't really use glibc code for licensing reasons. musl code shouldn't be a problem.

ikirill commented 7 years ago

@simonbyrne Right, sorry, I wasn't thinking about licenses when I posted that, I was just curious what the right approach was. They both seem to be based on fma emulation through rounding-to-odd, like this: Emulation of a FMA and correctly-rounded sums: proved algorithms using rounding to odd by Sylvie Boldo, Guillaume Melquiond

ViralBShah commented 3 years ago

In Julia, https://github.com/JuliaLang/julia/pull/42783 fixes the software fma.

ViralBShah commented 3 years ago
➜  ~ ./julia/julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-DEV.894 (2021-11-05)
 _/ |\__'_|_|_|\__'_|  |  Commit 7d41d1eb61 (4 days old master)
|__/                   |

julia> Base.fma_llvm(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0

julia> Base.fma(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0

julia> Base.fma_emulated(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0
yuyichao commented 3 years ago

Is the openlibm function fixed?

ViralBShah commented 3 years ago

Huh - I thought this issue was about the Julia fma implementation, which is fixed. I see that the openlibm one is still broken.