EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
428 stars 59 forks source link

Reverse mode gradient is up to 100x slower in Enzyme v0.12 #1409

Closed gdalle closed 2 months ago

gdalle commented 2 months ago

I think it is a consequence of the type instability spotted in #1401

Setup:

julia> using Chairmarks, Enzyme

julia> f(x) = sum(abs2, x);

julia> bench_outofplace(n) = @b (; x=rand(n),) gradient(Enzyme.Reverse, f, _.x)

julia> bench_inplace(n) = @b (; x=rand(n), g=rand(n)) gradient!(Enzyme.Reverse, _.g, f, _.x) evals=1

Out-of-place benchmarks:

Size Enzyme v0.11 Enzyme v0.12
10 38.921 ns (1 allocs: 144 bytes) 3.163 μs (31 allocs: 2.016 KiB, 0.01% compile time)
100 424.667 ns (1 allocs: 896 bytes) 3.766 μs (31 allocs: 2.750 KiB, 0.15% compile time)
1000 4.090 μs (1 allocs: 7.938 KiB) 9.798 μs (31 allocs: 9.812 KiB, <0.01% compile time)

In-place benchmarks (may be imprecise due to evals = 1, see https://chairmarks.lilithhafner.com/v1.2.1/tutorial#Common-pitfalls):

Size Enzyme v0.11 Enzyme v0.12
10 25.000 ns 3.039 μs (28 allocs: 1.547 KiB, <0.01% compile time)
100 361.000 ns 3.438 μs (28 allocs: 1.547 KiB, 0.16% compile time)
1000 4.279 μs 7.439 μs (28 allocs: 1.547 KiB, 0.17% compile time)
adrhill commented 2 months ago

I can reproduce this on Enzyme v0.12.0 and v0.11.20.

v0.11.20:

julia> bench_outofplace(10)
17.932 ns (1 allocs: 144 bytes)

julia> bench_outofplace(100)
54.717 ns (1 allocs: 896 bytes)

julia> bench_outofplace(1000)
347.222 ns (1 allocs: 8.000 KiB)

v0.12.0:

julia> bench_outofplace(10)
1.653 μs (31 allocs: 2.016 KiB)

julia> bench_outofplace(100)
1.722 μs (31 allocs: 2.750 KiB)

julia> bench_outofplace(1000)
2.521 μs (31 allocs: 9.875 KiB)
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M3 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 6 virtual cores)
wsmoses commented 2 months ago

My guess is that the recent change to use make zero and thus support more than just Julia arrays was the cause here.

Will try to look at this week and restore normal array perf.

On Mon, Apr 29, 2024 at 5:46 AM Guillaume Dalle @.***> wrote:

I think it is a consequence of #1401 https://github.com/EnzymeAD/Enzyme.jl/issues/1401

Enzyme v0.11

julia> using Chairmarks, Enzyme

julia> f(x) = sum(abs2, x) f (generic function with 1 method)

julia> @be (x=rand(100), g=rand(100)) gradient!(Enzyme.Reverse, .g, f, .x) Benchmark: 2542 samples with 86 evaluations min 344.081 ns median 346.872 ns mean 346.897 ns max 537.256 ns

(jl_mrTnl6) pkg> st Status /tmp/jl_mrTnl6/Project.toml [0ca39b1e] Chairmarks v1.2.1 ⌃ [7da242da] Enzyme v0.11.20 Info Packages marked with ⌃ have new versions available and may be upgradable.

Enzyme v0.12

julia> using Chairmarks, Enzyme f julia> f(x) = sum(abs2, x) f (generic function with 1 method)

julia> @be (x=rand(100), g=rand(100)) gradient!(Enzyme.Reverse, .g, f, .x) Benchmark: 2533 samples with 6 evaluations min 3.513 μs (28 allocs: 1.547 KiB, <0.01% compile time) median 3.915 μs (28 allocs: 1.547 KiB, 0.84% compile time) mean 4.809 μs (28 allocs: 1.547 KiB, 0.04% gc time, 0.84% compile time) max 2.012 ms (28 allocs: 1.547 KiB, 99.10% gc time, 12.06% compile time)

(jl_CnUVZ6) pkg> st Status /tmp/jl_CnUVZ6/Project.toml [0ca39b1e] Chairmarks v1.2.1 [7da242da] Enzyme v0.12.0

— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme.jl/issues/1409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXE7M3PYVO36JTQRUWDY7YJJDAVCNFSM6AAAAABG57R52OVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3DQNJYGM2TKNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gdalle commented 2 months ago

Thanks! If you need beta testers, let me know!

wsmoses commented 2 months ago

@gdalle can you see if https://github.com/EnzymeAD/Enzyme.jl/pull/1415 resolves the issue?

gdalle commented 2 months ago

On a slower laptop:

Enzyme 0.11

julia> bench_inplace(10)
141.000 ns

julia> bench_inplace(100)
866.000 ns

julia> bench_outofplace(10)
140.321 ns (1 allocs: 144 bytes)

julia> bench_outofplace(100)
907.133 ns (1 allocs: 896 bytes)

PR on Enzyme 0.12

julia> bench_inplace(10)
5.569 μs (28 allocs: 1.547 KiB, <0.01% compile time)

julia> bench_inplace(100)
6.383 μs (28 allocs: 1.547 KiB, 0.03% compile time)

julia> bench_outofplace(10)
5.483 μs (29 allocs: 1.688 KiB, 0.03% compile time)

julia> bench_outofplace(100)
6.405 μs (29 allocs: 2.422 KiB, <0.01% compile time)

Still not there it seems

wsmoses commented 2 months ago

Should be fixed by https://github.com/EnzymeAD/Enzyme.jl/pull/1415 please reopen if it persists

gdalle commented 2 months ago

fixed indeed, good job!