Function negation is a few times slower than expected if not inlined

bicycle1885 commented 3 years ago

I found an interesting performance discrepancy between !isfoo and x -> !isfoo(x) in the following code:

@noinline function split2A(isprefix, s)
    pos = findfirst(!isprefix, s)
    pos === nothing && return SubString(s), SubString("")
    stop = prevind(s, pos)
    return @view(s[begin:stop]), @view(s[pos:end])
end

@inline function split2B(isprefix, s)
    pos = findfirst(!isprefix, s)
    pos === nothing && return SubString(s), SubString("")
    stop = prevind(s, pos)
    return @view(s[begin:stop]), @view(s[pos:end])
end

@noinline function split2C(isprefix, s)
    pos = findfirst(c -> !isprefix(c), s)
    pos === nothing && return SubString(s), SubString("")
    stop = prevind(s, pos)
    return @view(s[begin:stop]), @view(s[pos:end])
end

using BenchmarkTools
using Random
Random.seed!(1234)

strs = [randstring(30) for _ in 1:100_000]
loop(split2, strs) = for s in strs split2(isletter, s) end
for split2 in [split2A, split2B, split2C]
    println(split2)
    @btime loop($split2, $strs)
    @btime loop($split2, $strs)
end

~/w/julia (master|…) $ ./julia split2.jl
split2A
  21.730 ms (100000 allocations: 6.10 MiB)
  19.835 ms (0 allocations: 0 bytes)
split2B
  7.654 ms (0 allocations: 0 bytes)
  7.530 ms (0 allocations: 0 bytes)
split2C
  7.801 ms (0 allocations: 0 bytes)
  7.783 ms (0 allocations: 0 bytes)

As can be seen from the benchmark result, version A (!isprefix) is much slower than version B (!isprefix, inlined) and version C (c -> !isprefix(c), no inlined). The expected behavior is of course that these three versions have the same performance.

julia> versioninfo()
Julia Version 1.7.0-DEV.1093
Commit feba24fd5d* (2021-05-13 13:50 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD Ryzen 9 3950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)
Environment:
  JULIA_PROJECT = @.

melonedo commented 3 years ago

Looking through @code_typed, the only difference is that split2A calls

invoke split2(Main.isletter::Function, %20::String)::Any

while split2C calls

invoke split2(Main.isletter::typeof(isletter), %20::String)::Any

Since Function is an abstract type, the performance difference is probably caused by dynamic dispatch. If I remember correctly, passing a function argument in the first line of a function definition is a hint, which means we should pass the function as is without specializing. So if we force specialization... (note: we need to type all of the arguments, implicit type parameter are not injected in this case)

@noinline function split2D(isprefix::T, s::S) where {T, S}
    pos = findfirst(!isprefix, s)
    pos === nothing && return SubString(s), SubString("")
    stop = prevind(s, pos)
    return @view(s[begin:stop]), @view(s[pos:end])
end

...then the numbers are good

julia>for split2 in [split2C, split2D]
    println(split2)
    @btime loop($split2, $strs)
    @btime loop($split2, $strs)
end
split2C
  13.188 ms (0 allocations: 0 bytes)
  13.147 ms (0 allocations: 0 bytes)
split2D
  12.940 ms (0 allocations: 0 bytes)
  12.916 ms (0 allocations: 0 bytes)

Conclusion: you should really inline these functions in order to make the hint work right! BTW, is there an exhaustive list of hints?

bicycle1885 commented 3 years ago

Thank you for the clarification.

Changing the definition of function negation so that it generates a specialized code seems significantly close the gap:

!(f::T) where T <: Function = (x...) -> !f(x...)

~/w/julia (master|…) $ ./julia split2.jl
split2A
  12.007 ms (100000 allocations: 6.10 MiB)
  10.704 ms (0 allocations: 0 bytes)
split2B
  7.688 ms (0 allocations: 0 bytes)
  7.668 ms (0 allocations: 0 bytes)
split2C
  7.709 ms (0 allocations: 0 bytes)
  7.848 ms (0 allocations: 0 bytes)

But there is still a minor performance gap between split2A and other two.

BTW, is there an exhaustive list of hints?

Sorry, I'm not sure what you mean. Hints of what?

melonedo commented 3 years ago

Oh, I shouldn't have asked for those hints here, it's irrelevant to the issue. I doubt whether we need to change that, because the simple ! function should almost always be inlined into another function, so specialization is not an issue. As can be seen from your issue, inlining ! works great and solves the problem. This behavior is also expected for other higher-order functions like map or reduce, because each independent function/lambda/functor will have its own type, and specializing means that we need to specialize nearly every invocation of !. See Compiler Efficiency Issues. However, I am not certain whether inlined functions need to record their specialization at all.

JuliaLang / julia

Function negation is a few times slower than expected if not inlined #40815