JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.61k stars 5.48k forks source link

Assigning to Nullable fields is faster than Union{Nothing,T} fields in some cases #29542

Open dcjones opened 6 years ago

dcjones commented 6 years ago

While optimizing some code (https://github.com/BioJulia/IntervalTrees.jl/pull/46) I found that storing some fields as Nullable rather than Union{Nothing,...} made assignment to those fields significantly faster, which seems contrary to the convention that Union{Nothing,...} is preferable in juila 1.0.

Here's the issue reproduced in a test case where the Nullable version is about 3 times faster.

using Nullables

# generate some Union{Nothing, Float64} value
function rand_float_or_nothing()
    x = rand()
    return x < 0.5 ? nothing : x
end

# storing with nothing
mutable struct T1
    x::Union{Nothing, Float64}
end

function setx!(t::T1)
    x = rand_float_or_nothing()
    t.x = x
end

# storing with nullable
mutable struct T2
    x::Nullable{Float64}
end

function setx!(t::T2)
    x = rand_float_or_nothing()
    t.x = x === nothing ? Nullable{Float64}() : Nullable{Float64}(x)
end

# do a bunch of assignments
function benchmark(t)
    @time for _ in 1:10000000
        setx!(t)
    end
end

# 0.088161 seconds
benchmark(T1(0.0))

# 0.032301 seconds
benchmark(T2(Nullable{Float64}(0.0)))
KristofferC commented 6 years ago

Is the # 0.88161 seconds a typo and should be # 0.088161 seconds? That is what I get at least (1.0.1).

dcjones commented 6 years ago

It was a typo, I edited it.

KristofferC commented 6 years ago

FWIW, adding a @nolinline on the rand_float_or_nothing makes both equally slow (0.13) seconds.

KristofferC commented 6 years ago

And on master there is no problem, because they are equally slow :D

julia>  benchmark(T1(0.0))
  0.081509 seconds

julia> benchmark(T2(Nullable{Float64}(0.0)))
  0.084752 seconds

julia> versioninfo()
Julia Version 1.1.0-DEV.403
dcjones commented 6 years ago

One interesting observation: if you change x < 0.5 to x > 0.0 or x < 0.0 (or anything else to make it deterministic or nearly deterministic either way), they're equally efficient, presumably because branch prediction is then successful.

The nothing version has to branch on whether x is nothing to decide which setfield! method to call, but the nullable version has to do the same thing to decide what constructor to call, so it isn't obvious to me why it doesn't pay the same price.

KristofferC commented 6 years ago

I made a version that didn't include generating random numbers during the benchmarking process and also shows the same performance characteristics on 1.0.1 and master.

using Nullables

# generate some Union{Nothing, Float64} value
function float_or_nothing(i, X)
    @inbounds x = X[i]
    return x < 0.5 ? nothing : x
end

# storing with nothing
mutable struct T1
    x::Union{Nothing, Float64}
end

function setx!(t::T1, i, X)
    x = float_or_nothing(i, X)
    t.x = x
end

# storing with nullable
mutable struct T2
    x::Nullable{Float64}
end

function setx!(t::T2, i, X)
    x = float_or_nothing(i, X)
    t.x = x === nothing ? Nullable{Float64}() : Nullable{Float64}(x)
end

# do a bunch of assignments
function benchmark(t, X)
    for i in 1:length(X)
        setx!(t, i, X)
    end
end

using BenchmarkTools

const N = 10_000_000
X = rand(N)

@btime benchmark(T1(0.0), X)
#  53.187 ms (1 allocation: 32 bytes)

@btime benchmark(T2(Nullable{Float64}(0.0)), X)
# 6.815 ms (1 allocation: 32 bytes)

Changing the condition to x < 0.0 or x> 1.0 I get

julia> @btime benchmark(T1(0.0), X)
# 9.334 ms (1 allocation: 32 bytes)

julia> @btime benchmark(T2(Nullable{Float64}(0.0)), X)
#  6.869 ms (1 allocation: 32 bytes)

so it seems the union one is much more sensitive to branch prediction or something...

And again, @noinline on the float_or_nothing and they are back to the same speed for deterministic behavior but for x < 0.5 suddenly the Union wins...

# @noinline
julia> @btime benchmark(T1(0.0), X)
#  69.776 ms (1 allocation: 32 bytes)

julia> @btime benchmark(T2(Nullable{Float64}(0.0)), X)
#  76.681 ms (1 allocation: 32 bytes)
KristofferC commented 6 years ago

And just as a confirmation, sorting X also gives a big boost in the Union case

julia> X2 = sort(X);

julia> @btime benchmark(T1(0.0), X)
#  52.635 ms (1 allocation: 32 bytes)

julia> @btime benchmark(T1(0.0), X2)
#  8.646 ms (1 allocation: 32 bytes)

but the nullable version has to do the same thing to decide what constructor to call, so it isn't obvious to me why it doesn't pay the same price.

Does it though? For the Nullable isnt it calling Nullable{Float64}(::Bool, ::Float64) in both cases, just in one of the cases it is (false, uninit) and in the second it is (true, x).