Open dcjones opened 6 years ago
Is the # 0.88161 seconds
a typo and should be # 0.088161 seconds
? That is what I get at least (1.0.1).
It was a typo, I edited it.
FWIW, adding a @nolinline
on the rand_float_or_nothing
makes both equally slow (0.13) seconds.
And on master there is no problem, because they are equally slow :D
julia> benchmark(T1(0.0))
0.081509 seconds
julia> benchmark(T2(Nullable{Float64}(0.0)))
0.084752 seconds
julia> versioninfo()
Julia Version 1.1.0-DEV.403
One interesting observation: if you change x < 0.5
to x > 0.0
or x < 0.0
(or anything else to make it deterministic or nearly deterministic either way), they're equally efficient, presumably because branch prediction is then successful.
The nothing version has to branch on whether x
is nothing to decide which setfield!
method to call, but the nullable version has to do the same thing to decide what constructor to call, so it isn't obvious to me why it doesn't pay the same price.
I made a version that didn't include generating random numbers during the benchmarking process and also shows the same performance characteristics on 1.0.1 and master.
using Nullables
# generate some Union{Nothing, Float64} value
function float_or_nothing(i, X)
@inbounds x = X[i]
return x < 0.5 ? nothing : x
end
# storing with nothing
mutable struct T1
x::Union{Nothing, Float64}
end
function setx!(t::T1, i, X)
x = float_or_nothing(i, X)
t.x = x
end
# storing with nullable
mutable struct T2
x::Nullable{Float64}
end
function setx!(t::T2, i, X)
x = float_or_nothing(i, X)
t.x = x === nothing ? Nullable{Float64}() : Nullable{Float64}(x)
end
# do a bunch of assignments
function benchmark(t, X)
for i in 1:length(X)
setx!(t, i, X)
end
end
using BenchmarkTools
const N = 10_000_000
X = rand(N)
@btime benchmark(T1(0.0), X)
# 53.187 ms (1 allocation: 32 bytes)
@btime benchmark(T2(Nullable{Float64}(0.0)), X)
# 6.815 ms (1 allocation: 32 bytes)
Changing the condition to x < 0.0
or x> 1.0
I get
julia> @btime benchmark(T1(0.0), X)
# 9.334 ms (1 allocation: 32 bytes)
julia> @btime benchmark(T2(Nullable{Float64}(0.0)), X)
# 6.869 ms (1 allocation: 32 bytes)
so it seems the union one is much more sensitive to branch prediction or something...
And again, @noinline
on the float_or_nothing
and they are back to the same speed for deterministic behavior but for x < 0.5
suddenly the Union wins...
# @noinline
julia> @btime benchmark(T1(0.0), X)
# 69.776 ms (1 allocation: 32 bytes)
julia> @btime benchmark(T2(Nullable{Float64}(0.0)), X)
# 76.681 ms (1 allocation: 32 bytes)
And just as a confirmation, sorting X
also gives a big boost in the Union case
julia> X2 = sort(X);
julia> @btime benchmark(T1(0.0), X)
# 52.635 ms (1 allocation: 32 bytes)
julia> @btime benchmark(T1(0.0), X2)
# 8.646 ms (1 allocation: 32 bytes)
but the nullable version has to do the same thing to decide what constructor to call, so it isn't obvious to me why it doesn't pay the same price.
Does it though? For the Nullable
isnt it calling Nullable{Float64}(::Bool, ::Float64)
in both cases, just in one of the cases it is (false, uninit)
and in the second it is (true, x)
.
While optimizing some code (https://github.com/BioJulia/IntervalTrees.jl/pull/46) I found that storing some fields as
Nullable
rather thanUnion{Nothing,...}
made assignment to those fields significantly faster, which seems contrary to the convention thatUnion{Nothing,...}
is preferable in juila 1.0.Here's the issue reproduced in a test case where the Nullable version is about 3 times faster.