Code generation inefficiency?

pitsianis commented 2 years ago

I want to take the derivative of a function and generate code that is as efficient as if I had typed the derivative by hand. So I wrote

function ∇(f)
  @variables x

  dfx = Symbolics.derivative(f(x),x)
  eval(SymbolicUtils.Code.toexpr(SymbolicUtils.Code.Func([x],[],dfx)))

end

For a simple example

f(x) = log(1+x)
df(x) = (1 + x)^-1
sdf = ∇(f)

The timing test using Version 1.7.2 (2022-02-06) Official https://julialang.org/ release on macos 11.6.5 (20G527) with

function test(x)
  @btime sdf($x)
end

function testdirect(x)
  @btime df($x)
end

println("Synthetic function $(test(2.0))")
println("Direct    function $(testdirect(2.0))")

returns

  29.734 ns (2 allocations: 32 bytes)
Synthetic function 0.3333333333333333
  4.552 ns (0 allocations: 0 bytes)
Direct    function 0.3333333333333333

Why the synthetic derivative is 6x slower and makes two allocations? How do I use it properly?

I appreciate your hard work and I hope I can get up to speed to help at some time in the near future.

ChrisRackauckas commented 2 years ago

I think this is just in the allocating versions of the functions?

pitsianis commented 2 years ago

I am even more confused!

The code produced seems much simpler for sdf compared to df. Yet sdf allocates memory and runs slower.

julia> @code_lowered sdf(2.0)
CodeInfo(
1 ─ %1 = (+)(1, x)
│   %2 = (inv)(%1)
└──      return %2
)

julia> @code_lowered df(2.0)
CodeInfo(
1 ─ %1 = 1 + x
│   %2 = Core.apply_type(Base.Val, -1)
│   %3 = (%2)()
│   %4 = Base.literal_pow(Main.:^, %1, %3)
└──      return %4
)

And if I use the REPL, the time difference is even more dramatic

julia> @btime sdf(3.0)
  23.003 ns (1 allocation: 16 bytes)
0.25

julia> @btime df(3.0)
  0.042 ns (0 allocations: 0 bytes)
0.25

Any explanation for any of the above?

MasonProtter commented 2 years ago

This is just a case of not benchmarking correctly. There's some subtle tricky things involved in properly benchmarking functions in julia, so I'll explain the two things that went wrong here:

First of all, a benchmark time of 0.042 ns indicates that nothing happened, because a CPU cannot do anything faster than a nanosecond. Essentially, the compiler saw that @btime was calling df with a constant input and so it just evaluated the function result at compile time, replacing it with it's value. The way to avoid this is by interpolating a Ref object into the benchmark loop and then dereferencing it

Secondly, sdf the way you've defined it here is a global, non-constant variable, unlike df, so the compiler needs to look up its value before it can run the benchmark at each iteration. If you make it a const, then things should be fast.

julia> using Symbolics, SymbolicUtils

julia> function ∇(f)
         @variables x

         dfx = Symbolics.derivative(f(x),x)
         eval(SymbolicUtils.Code.toexpr(SymbolicUtils.Code.Func([x],[],dfx)))

       end
∇ (generic function with 1 method)

julia> f(x) = log(1+x)
f (generic function with 1 method)

julia> df(x) = (1 + x)^-1
df (generic function with 1 method)

julia> const sdf = ∇(f)
#3 (generic function with 1 method)

julia> let x = Ref(1.0)
           @btime df($x[])
           @btime sdf($x[])
       end
  1.256 ns (0 allocations: 0 bytes)
  1.497 ns (0 allocations: 0 bytes)
0.5

pitsianis commented 2 years ago

Thanks Mason!

JuliaSymbolics / SymbolicUtils.jl

Code generation inefficiency? #441