Open yurivish opened 1 year ago
I’ve tracked the invocations down to this definition of div, which has existed at least as far back as Julia 1.6:
This was moved from base/operators.jl
in this PR - see the diff here. The method seems to have been buggy before that, judging from these outputs:
# 1.3.0
Test Summary: | Pass Fail Total
All | 16 7 23
Float16 | 6 5 11
div(514, 0.75) | 2 2
fld(514, 0.75) | 2 2
cld(515, 0.75) | 2 2
Extended Failures | 5 5
Float32 | 10 2 12
cld | 2 2
fld | 1 1 2
regular division | 2 2
Extended Failures | 5 1 6
# 1.4.0
Test Summary: | Pass Fail Total
All | 17 6 23
Float16 | 8 3 11
div(514, 0.75) | 2 2
fld(514, 0.75) | 2 2
cld(515, 0.75) | 1 1 2
Extended Failures | 5 5
Float32 | 9 3 12
cld | 1 1 2
fld | 1 1 2
regular division | 2 2
Extended Failures | 5 1 6
# master
Test Summary: | Pass Fail Total Time
All | 16 7 23 0.8s
Float16 | 7 4 11 0.7s
div(514, 0.75) | 1 1 2 0.6s
fld(514, 0.75) | 2 2 0.1s
cld(515, 0.75) | 1 1 2 0.0s
Extended Failures | 5 5 0.0s
Float32 | 9 3 12 0.1s
cld | 1 1 2 0.0s
fld | 1 1 2 0.0s
regular division | 2 2 0.0s
Extended Failures | 5 1 6 0.0s
The passing Float16 tests in 1.3 are a bit surprising, since back then the way these were implemented was by doing the division in Float32 and just converting to Float16 afterwards. As far as I can tell, these are/were untested :grimacing:
Full tests & outputs can be found in this gist.
A divisor producing the wrong result can be found for over 51% of all possible 16-bit floats.
Sounds like you have a testsuite covering these cases, it would be great if that could be incorporated into our testsuite!
Sounds like you have a testsuite covering these cases, it would be great if that could be incorporated into our testsuite!
I wrote a function that counts the number of floats with a floor above (or ceiling below) the result of regular division, using a nested loop.
Running it now over the full set of floats it looks like my earlier 51% figure was an underestimate, and a divisor producing the wrong result can be found for 68.4% of all possible 16-bit floats.
"""
Returns the number of 16-bit floats for which
a divisor exists such that cld or fld round in
an incorrect direction.
Usage:
julia> n = count_wrong_floats();
julia> n / 2^16
0.68414306640625
"""
function count_wrong_floats()
count = 0
for i in 0x0000:0xffff
x = reinterpret(Float16, i)
for j in 0x0000:0xffff
y = reinterpret(Float16, j)
quotient = x / y
floor_is_wrong = fld(x, y) > quotient
ceil_is_wrong = cld(x, y) < quotient
if floor_is_wrong || ceil_is_wrong
count += 1
break
end
end
end
count
end
Ah, gotcha - that seems to be a tad bit high. I'm only counting 0.05%, when taking both arguments into account:
julia> function count_wrong_floats()
ret = spzeros(Bool, 0x10000, 0x10000)
for i in 0x0000:0xffff
x = reinterpret(Float16, i)
for j in 0x0000:0xffff
y = reinterpret(Float16, j)
quotient = x / y
floor_is_wrong = fld(x, y) > quotient
ceil_is_wrong = cld(x, y) < quotient
ret[Int(i)+1,Int(j)+1] = floor_is_wrong | ceil_is_wrong
end
end
ret
end
count_wrong_floats (generic function with 1 method)
julia> res = count_wrong_floats();
julia> count(res) / length(res)
0.0005839932709932327
The sparsity pattern is insightful:
With one quadrant of that looking like:
julia> quadrant = res[1:end÷2, 1:end÷2]
32768×32768 SparseMatrixCSC{Bool, Int64} with 627058 stored entries:
⎡⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⢻⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⢸⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⢸⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⢹⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⢻⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠻⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠈⠻⢇⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠈⠹⣶⣶⣶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣶⣶⣶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⣤⣤⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠹⣧⣤⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⢇⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠹⣶⣶⣶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠻⣶⣶⣶⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠛⣤⣤⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣧⣤⣤⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠻⣿⣿⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣇⣀⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⡿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎦
The repeating nature of this pattern suggests that the issue is independent of the sign of the inputs.
Zooming in on one of these spikes (specifically, the one at x=11265:12288
and y=1024:2048
, both from the top left corner) and rendering it:
Red means floor_is_wrong
, blue means ceil_is_wrong
. Maybe someone knowledgable in floating point shenanigans can gleam what is wrong with our implementations from that?
EDIT: After managing to save the full image (which browsers don't like to render..), I can confirm that the other spikes to the bottom right are exactly the same as the one from this image.
That error rate suggests that this bug manifests once out of every ~1,700 randomly chosen Float16
divisions:
julia> 1 / 0.0005839932709932327
1712.3484972681954
That's frequent enough that randomized testing should let us estimate the prevalence of the bug at higher precisions, so I put together a quick script (see below).
Based on a billion samples, the bug manifests once out of every ~10,000 randomly chosen Float32
divisions and once out of every ~74,000 Float64
divisions.
julia> function estimate_error_rate(F, U, n)
count = 0
# Generate random numbers for testing
X = reinterpret.(F, rand(U, n))
Y = reinterpret.(F, rand(U, n))
for (x, y) in zip(X, Y)
quotient = x / y
# Increment the count when we observe the bug
count += Int(fld(x, y) > quotient || cld(x, y) < quotient)
end
count / n
end
estimate_error_rate (generic function with 1 method)
julia> rate = estimate_error_rate(Float32, UInt32, 1_000_000_000)
0.000101178
julia> 1 / rate
9883.571527407144
julia> rate = estimate_error_rate(Float64, UInt64, 1_000_000_000)
1.3525e-5
julia> 1 / rate
73937.15341959335
Here's what I found digging into this:
julia> x,y,r = Float16(514),Float16(0.75),RoundDown
(Float16(514.0), Float16(0.75), RoundingMode{:Down}())
julia> divrem(widen(x),y,r)
(685.0f0, 0.25f0)
julia> divrem(x,y,r) # div is wrong, but rem is correct
(Float16(686.0), Float16(0.25))
julia> widen(x)/y
685.3333f0
julia> x/y
Float16(685.5)
julia> # div(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat} = convert(T,round((x-rem(x,y,r))/y))
div(x,y,r)
Float16(686.0)
julia> eps(x)
Float16(0.5)
julia> x-rem(x,y,r) # == x due to limited precision
Float16(514.0)
julia> (x-rem(x,y,r)) / y
Float16(685.5)
So the issue appears to be with the implementation
div(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat} = convert(T,round((x-rem(x,y,r))/y))
It assumes that x-rem(x,y,r)
will be roughly evenly divisible by y
. This is only true if there is small roundoff error in the subtraction. It then rounds the number to the nearest integer (which would address small errors in the subtraction, but not large ones). This means that, in the worst case of total underflow during the subtraction (as we see in the example), we effectively get div(x,y,::RoundingMode) = round(x/y)
. One notices that the rounding mode is effectively "unused" when the remainder is small relative to the numerator.
These examples show that this manifests when x - rem(x,y,r) == x
and the floating point rounding mode (usually RoundNearest
(ties go to even) resolves a 0.5
fractional part of x/y
in the "unlucky" direction.
Also, with this failure mode in mind, it's entirely possible to construct failing examples in any precision:
julia> x,y,r = 1e0,eps(x)*(3//2),RoundUp
(1.0, 3.3306690738754696e-16, RoundingMode{:Up}())
julia> div(x,y,RoundDown) <= x/y, div(x,y,RoundUp) >= x/y
(true, false)
Note that BigFloat
has its own method and does not suffer from this bug.
It appears that redefining
newdiv(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat} = convert(T,round((x-rem(x,y,r))/y,r)) # add rounding mode to `round` call
function count_wrong_floats()
count_old_wrong = 0
count_new_wrong = 0
for i in 0x0000:0xffff
x = reinterpret(Float16, i)
for j in 0x0000:0xffff
y = reinterpret(Float16, j)
quotient = x / y
old_is_wrong = (div(x, y, RoundDown) > quotient) | (div(x, y, RoundUp) < quotient)
new_is_wrong = (newdiv(x, y, RoundDown) > quotient) | (newdiv(x, y, RoundUp) < quotient)
count_old_wrong += old_is_wrong
count_new_wrong += new_is_wrong
end
end
return count_old_wrong,count_new_wrong
end
count_wrong_floats()
# (2511024, 2792)
versioninfo()
#=
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 1 on 12 virtual cores
=#
has significantly reduced the number of incorrect input pairs. This still feels like it may not be treating the full extent of these roundoff issues, however. I'm out of time to chase down the remaining issues right now.
It's actually very fortunate that we use the same methods for all IEEEFloat
types. It means that if we can fix it for all Float16
pairs then it will probably be correct for all Float32
or Float64
pairs.
I wanted to plot the new sparsity pattern, but now I cannot reproduce those numbers on my end:
julia> function count_wrong_floats()
count_old_wrong = 0
count_new_wrong = 0
for i in 0x0000:0xffff
x = reinterpret(Float16, i)
for j in 0x0000:0xffff
y = reinterpret(Float16, j)
quotient = x / y
old_is_wrong = (div(x, y, RoundDown) > quotient) | (div(x, y, RoundUp) < quotient)
new_is_wrong = (newdiv(x, y, RoundDown) > quotient) | (newdiv(x, y, RoundUp) < quotient)
count_old_wrong += old_is_wrong
count_new_wrong += new_is_wrong
end
end
return count_old_wrong,count_new_wrong
end
count_wrong_floats (generic function with 1 method)
julia> count_wrong_floats()
(2508232, 0)
julia> versioninfo()
Julia Version 1.10.0-DEV.891
Commit 7341fb9517* (2023-03-28 05:15 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: 24 × AMD Ryzen 9 7900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
Threads: 24 on 24 virtual cores
Environment:
JULIA_NUM_THREADS = 24
Which version are you on? I actually tried that exact same fix earlier today, but ran into some problems with Float32
still not being fixed - perhaps that was a fluke though..
Which version are you on?
I added versioninfo to my earlier post. It's v1.8.0 (kinda old) on Intel x86-64 (non-native Float16). I ran the test again in a fresh session and reproduced my earlier number of failures. I'll remark that my number of resolved failures 2511024 - 2792
equals your number of resolved failures 2508232
, so it appears some orthogonal issue I was seeing is resolved by a post-1.8.0 version or your hardware.
It's interesting to hear that some Float32
examples were not resolved by this. Can you post an example or two?
Well, if we already use a round
, I don't see the problem with:
newdiv2(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat} = convert(T,round(x/y,r))
The code x-rem(x,y,r))/y
seems to have been implemented before round
and ensures that x/y can be implemented using integer division. After implementing round
someone seems to have directly wrapped it without removing the rem
part!
With the above definition (newdiv2
), count_wrong_floats()
is zero.
Edit: minor editorial changes.
The code
x-rem(x,y,r))/y
seems to have been implemented beforeround
and ensures that x/y can be implemented using integer division. After implementinground
someone seems to have directly wrapped it without removing therem
part!
It appears that the rem
component cannot be easily discarded. Consider this example: what should fld(1.0,0.1)
produce? At a glance, one would say 10.0
. This is wrong. Recall that 0.1
is not exactly representable in Float64
.
julia> big(0.1)
0.1000000000000000055511151231257827021181583404541015625
julia> 1.0 / 0.1
10.0
julia> fma(0.1, 10.0, -1.0) # 0.1 * 10.0 > 1.0 (before rounding)
5.551115123125783e-17
julia> divrem(1, 0.1, RoundDown)
(9.0, 0.09999999999999995)
We see that we cannot ensure the correct result by computing x / y
in the default floating point environment and then rounding the way we want. But we currently don't have good control over the floating point rounding mode (I think? setrounding
only works for BigFloat
and has a spooky warning about a lack of thread safety) and I'd be anxious trying to mess with it in any case (I don't remember if it's thread-local or processor-local or varies by architecture).
So some form of precision extension, rem
, fma
, or other mechanism of delayed rounding (or a way to briefly change the floating point environment without risking breaking everything elsewhere) appears necessary to get this right.
A proof-of-concept fix using double-word floating-point arithmetic could maybe look something like this (not tested yet):
const two_mul = Base.Math.two_mul
function fast_two_sum(a, b)
s = a + b
z = s - a
t = b - z
(s, t)
end
# Double-word floating-point division of two single-word numbers.
#
# Assumes `rounding(T) == RoundNearest`
#
# Adapted from "DWDivFP3" AKA "Algorithm 15" from
# https://doi.org/10.1145/3121432 by Joldes, Muller, Popescu.
#
# Unlike two_mul and fast_two_sum, this isn't error-free.
function two_div(x, y)
hi = x / y
π = two_mul(hi, y)
lo = ((x - first(π)) - last(π)) / y
fast_two_sum(hi, lo)
end
double_word_sign(hi, lo) = iszero(hi) ? sign(lo) : sign(hi)
div_sign_helper(sign, ::RoundingMode{:Nearest}) = zero(sign)
div_sign_helper(sign, ::RoundingMode{:NearestTiesAway}) = zero(sign)
div_sign_helper(sign, ::RoundingMode{:NearestTiesUp}) = zero(sign)
div_sign_helper(sign, ::RoundingMode{:ToZero}) = -sign
div_sign_helper(sign, ::RoundingMode{:FromZero}) = sign
div_sign_helper(sign, ::RoundingMode{:Up}) = one(sign)
div_sign_helper(sign, ::RoundingMode{:Down}) = -one(sign)
function Base.div(x::T, y::T, r::RoundingMode) where {T<:AbstractFloat}
(hi, lo) = two_div(x, y)
rounded = convert(T, round(hi, r))
increment = zero(T)
if (hi == rounded) & !iszero(lo)
# hi is integral and lo isn't zero: lo matters
increment = div_sign_helper(double_word_sign(hi, lo), r)
(!iszero(increment) & (rounded == (rounded + increment))) &&
throw(InexactError(:div, T, (rounded, increment)))
end
return rounded + increment
end
I guess that the assumption of rounding(T) == RoundNearest
(necessary for multi-word arithmetics AFAIK) is OK considering that the round
documentation already says:
round
may give incorrect results if the global rounding mode is changed
The code
x-rem(x,y,r))/y
seems to have been implemented beforeround
and ensures that x/y can be implemented using integer division. After implementinground
someone seems to have directly wrapped it without removing therem
part!It appears that the
rem
component cannot be easily discarded. Consider this example: what shouldfld(1.0,0.1)
produce? At a glance, one would say10.0
. This is wrong. Recall that0.1
is not exactly representable inFloat64
.
Thank you for this comment. Indeed, as you argue, it is not meaningful to round at the same precision as the computation of x/y
. fld
(as above) seems to be written (and seems important for) for integral arguments, anyway. However, in this case, I feel that simple rounddown(x/y)
, i.e.:
julia> fld(10.0,0.1)
99.0
is preferable compared to 100.0
, in case anybody would want to use it. Furthermore, the documentation clarifies this point: https://docs.julialang.org/en/v1/base/math/#Base.fld (and seems to focus on floating-point numbers; not sure why).
This is because, otherwise, the definition x/y > fld(x,y)
might not be met with conventional FLT instructions and we may have to implement the floating point division ourselves in higher precision in stdlib
, which might be counterproductive.
IMHO as well, the perfect way, of course, is to tell the Float16
FPU or software, whose job this is anyway, to round appropriately. But, wrapping it in round
seems to be the second best approach and, IMHO, the best we can do in base
.
But, kindly also advise if you see any issues with fld(10.0,0.1)
being 99.0
? If the reasoning is strong, we can argue to change the behaviour.
Would it be useful to look at the implementation of this in other languages?
LLVM's libc is usually approachable
LLVM's libc is usually approachable
I don't think libc has a function for this. The closest is the integer value returned by remquo
, but that is only guaranteed to give the three least significant bits.
Note that all these examples have eps(x/y) >= 0.5
. Once eps(x/y) > 1
, then everything sort of breaks down, since you may not actually be able to represent the quotient anyway.
Apparently at one point I proved our division was exact for cases where eps(x/y) < 0.5
(https://github.com/JuliaLang/julia/issues/4156#issuecomment-72632102), but I don't recall the details.
We could easily fix these for Float16
and Float32
by promoting to Float64
? In fact, you might even be able to use simply trunc(Float64(x) / Float64(y))
if you can extend the results from this paper: http://www.vinc17.net/research/papers/rr_intdiv.pdf
For a Float64 example:
julia> div(9007199254740994, 3)
3002399751580331
julia> Int(div(9007199254740994.0, 3.0))
3002399751580330
julia> eps(9007199254740994.0/3.0)
0.5
Python v3.11.3 can't handle this either
>>> 9007199254740994 // 3
3002399751580331
>>> 9007199254740994.0 // 3.0
3002399751580330.0
In fact, you might even be able to use simply
trunc(Float64(x) / Float64(y))
if you can extend the results from this paper: http://www.vinc17.net/research/papers/rr_intdiv.pdf
It's even easier, Theorem 1 is sufficient: if x,y
are Float32
, and eps(x/y) <= 1
, then that implies x <= 2^25 * y
, and consequently x-y
has at most 25+24=49 significant bits, which can be exactly represented in a Float64
.
I'm going to start thinking about this issue since I need to read the material for background on a paper I'm reviewing.
I think in hindsight we should not have implemented these methods for Floating point types since although there is a definition that is precise, it seems unlikely to be useful and in some cases is impossible to compute efficiently. I think the best change here probably is to just add the check for whether the result may be ambiguous (which is just a check to check eps
on the remainder to make sure it is sufficiently small that we won't misround), and in that case, we can just redo the computation with sufficiently wide BigFloat
.
Julia has a
div
function that is used to implement floor division (fld
) and ceil division (cld
). I think I’ve found a bug that causes all of these division functions to return incorrect results.Floor division is documented to return the largest integer less than or equal to
x / y
. It should never return a number that is greater thanx / y
.But it does:
Similarly, ceil division should never return a number that is smaller than regular division, but it does:
This behavior is not limited to 16-bit floats. Here’s a case where
fld
produces an incorrect result forFloat32
inputs:And here’s the same for
cld
:The equivalent operations in Python produce the correct results:
Examples of this incorrect behavior are not hard to find – for most floats, you can find a divisor that will make either
fld
orcld
return the wrong answer.Here are some examples for
Float16
where eitherfld
orcld
is incorrect:cld(1, Float16(0.000999)) < 1 / Float16(0.000999)
cld(2, Float16(0.001999)) < 2 / Float16(0.001999)
cld(3, Float16(0.002934)) < 3 / Float16(0.002934)
cld(4, Float16(0.003998)) < 4 / Float16(0.003998)
fld(5, Float16(0.004925)) > 5 / Float16(0.004925)
And here are some for
Float32
:fld(5, Float32(6.556511e-7)) > 5 / Float32(6.556511e-7)
fld(10, Float32(1.3113022e-6)) > 10 / Float32(1.3113022e-6)
fld(11, Float32(1.4305115e-6)) > 11 / Float32(1.4305115e-6)
cld(16, Float32(2.8014183e-6)) < 16 / Float32(2.8014183e-6)
cld(17, Float32(2.2053719e-6)) < 17 / Float32(2.2053719e-6)
For simplicity I’ve presented examples where the first argument is an integer; this bug also occurs for non-integral inputs.
A divisor producing the wrong result can be found for over 51% of all possible 16-bit floats. I have not evaluated how widespread this is for
Float32
, but the results above suggest that it is similarly easy to find failures there too.I’ve tracked the invocations down to this definition of
div
, which has existed at least as far back as Julia 1.6: