Open ikirill opened 7 years ago
FWIW, glibc fmaf
got this right. (And it also uses hardware instructions when available...)
For reference, this is what glibc seems to be doing, if I translated the code right (in works in my tests):
function get_FE_INEXACT()
# https://github.com/JuliaLang/julia/blob/2e6d908cea8334e875ea4d22d771a6eab0eef8e7/src/jlapi.c#L413
u = zeros(Cint, 9) + typemin(Cint)
ccall(:jl_get_fenv_consts, Void, (Ptr{Cint},), u)
myid() > 1 || println(STDERR, "jl_get_fenv_consts: ", u)
u[1]
end
const FE_INEXACT = get_FE_INEXACT()
fetestexcept(excepts) = ccall((:fetestexcept, :libc), Cint, (Cint,), excepts)
feclearexcept(excepts) = ccall((:feclearexcept, :libc), Cint, (Cint,), excepts) == 0 || error("feclearexcept")
feinexact() = fetestexcept(FE_INEXACT) != 0
function glibc_fmaf(x::Float32, y::Float32, z::Float32)
temp = Float64(x) * Float64(y)
if temp == Float64(-z)
return Float32(temp) + z
end
setrounding(Float64, RoundToZero)
feclearexcept(FE_INEXACT)
u = temp + Float64(z)
inexact = feinexact()
setrounding(Float64, RoundNearest)
w = reinterpret(UInt64, u)
if inexact && (w & 0x1) == 0 && ((w & 0x7ff0_0000_0000_0000) >> 52) != 0x7ff
Float32(reinterpret(Float64, w | 0x1))
else
Float32(u)
end
end
One idea I had is that the result - xy == z
is not actually a correct way to test whether the sum xy + z
was computed exactly, and replace the test with computing the correct error using the double-double 2sum algorithm and checking that the known correct error is zero instead. This is different from what glibc does, which seems based on a different concept entirely (but maybe not, on further thought I'm not sure; maybe all I'm doing is emulating the FE_INEXACT flag). Then it no longer seems to fail (I tested it on 2^35 random values, which is good enough to trigger the original bug):
# Hopefully this causes it to correctly round to zero
@noinline function add_rtz(x::Float64, y::Float64)
ccall(:fesetround, Cint, (Cint,), Base.Rounding.JL_FE_TOWARDZERO)
w = x + y
ccall(:fesetround, Cint, (Cint,), Base.Rounding.JL_FE_TONEAREST)
return w
end
# This is a translation of openlibm's fmaf
function fma_libm_modified(x::Float32, y::Float32, z::Float32)
xy = Float64(x) * Float64(y)
# The original openlibm code:
# result = xy + Float64(z)
# exact = result - xy == z
# A correct test for whether xy+z was computed exactly
# This is the double-double 2sum algorithm
result = xy + Float64(z)
temp = result - xy
# order of parens is important:
result_err = (xy - (result - temp)) + (Float64(z) - temp)
exact = result_err == 0.0
w = reinterpret(UInt64, result)
if (w & 0x1fffffff) != 0x10000000 || (w & 0x7ff0000000000000) == 0x7ff0000000000000 || exact
return Float32(result)
else
adjusted_result = add_rtz(xy, Float64(z))
if result == adjusted_result
return Float32(reinterpret(Float64, w + 1))
else
return Float32(adjusted_result)
end
end
end
We can't really use glibc code for licensing reasons. musl code shouldn't be a problem.
@simonbyrne Right, sorry, I wasn't thinking about licenses when I posted that, I was just curious what the right approach was. They both seem to be based on fma emulation through rounding-to-odd, like this: Emulation of a FMA and correctly-rounded sums: proved algorithms using rounding to odd by Sylvie Boldo, Guillaume Melquiond
In Julia, https://github.com/JuliaLang/julia/pull/42783 fixes the software fma
.
➜ ~ ./julia/julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.8.0-DEV.894 (2021-11-05)
_/ |\__'_|_|_|\__'_| | Commit 7d41d1eb61 (4 days old master)
|__/ |
julia> Base.fma_llvm(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0
julia> Base.fma(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0
julia> Base.fma_emulated(0.9474001f0, 4.639901f-7, -0.24325085f0)
-0.24325041f0
Is the openlibm function fixed?
Huh - I thought this issue was about the Julia fma implementation, which is fixed. I see that the openlibm one is still broken.
This example was pointed out by njuffa on scicomp.stackexchange.com. I checked it also with a recent version of Z3 and exact rational arithmetic.