Open KristofferC opened 1 year ago
This should now be resolved by https://github.com/EnzymeAD/Enzyme.jl/pull/881
I am not sure that fixed all instances. Need to confirm with an actual profile
I just want to point out that this is still happening and causes ~18s load time increase to any package that recursively ends up getting Enzyme as a dependency.
hm @KristofferC do you have any information regarding what could be causing this?
Enzyme.__init__
.Enzyme.__init__
in invalidated and has to be recompiled at load time.Enzyme.__init__
takes about 18 seconds.is there a tool or something to profile what methods/invalidations are the source?
One would usually use SnoopCompile. I can try (later) to see if I can find what actually ends up invalidating it.
I will note that the Enzyme Blas init loader presently is expected to have a long load time and I'm not sure of how to fix that.
Specifically here: https://github.com/EnzymeAD/Enzyme.jl/blob/982949462e7e0a2d07ccf4158c440561117dc152/src/compiler/validation.jl#L67
We need the runtime address of blas and julia runtime libraries to be able to identify what function is being ccall'ed (in IR it is an integer). This is definitionally runtime information so it couldn't be preserved. If you have ideas for a design that avoid the expense, I'm all ears.
If it's somewhere else causing the issue definitely interested in what's causing it so we can fix it.
julia> @time using Enzyme
0.383814 seconds (566.30 k allocations: 33.282 MiB, 21.24% compilation time: 27% of which was recompilation)
But:
julia> @time using Tracker
1.450969 seconds (1.83 M allocations: 109.970 MiB, 7.33% gc time, 31.56% compilation time: 26% of which was recompilation)
julia> @time using Enzyme
21.592203 seconds (43.97 M allocations: 2.517 GiB, 1.84% gc time, 98.84% compilation time: 100% of which was recompilation)
is some sort of MWE. Will narrow it down more.
Okej, Tracker committed the mortal sin of defining show
on a Type
. With https://github.com/FluxML/Tracker.jl/pull/156 this is improved to
julia> @time using Tracker
1.463012 seconds (1.90 M allocations: 113.004 MiB, 4.50% gc time, 25.19% compilation time: 16% of which was recompilation)
julia> @time using Enzyme
0.590007 seconds (589.92 k allocations: 33.254 MiB, 14.00% gc time, 56.16% compilation time: 99% of which was recompilation)
Anyway, as a question, is there anyway this work can be moved from __init__
to until when it is actually needed? So you pay the cost if you use it but otherwise not?
It seems to me that that __init__()
is just setting up ptr_map
which is only used by restore_lookup
, which could just do an initialization check, and if ptr_map
is not initialized, it runs that setup code.
From some testing, it seems the __init__
that takes a long time to compile is https://github.com/EnzymeAD/Enzyme.jl/blob/019258a55ce7f9699c91cb87bb584640dd5b33e2/src/compiler.jl#L2425-L2454
Okay if that init takes longer to compile I am a bit more unclear how to resolve it.
So some context, this code interfaces between Enzyme proper (aka Enzyme_jll) and Enzyme.jl to registister all of the julia specific handling. As a result this actually is most of Enzyme.jl's code, so this is highly desirable to be precompiled.
this is done by taking many functions in Julia code, making a cfunction of it, then passing the Julia cfunction runtime pointer into Enzyme_jll. While the actual call to pass the pointer (and the pointer itself) are runtime specific values, the actual compilation of those cfunctions on those specified types is fixed, and should be in theory able to be precompiled (but I presume not in practice here).
Those cfunctions are wrapped in a macro (see here: https://github.com/EnzymeAD/Enzyme.jl/blob/019258a55ce7f9699c91cb87bb584640dd5b33e2/src/rules/llvmrules.jl#L1030 ) which creates an anonymous wrapper function to make it easier, which perhaps makes things more difficult.
Regardless, I'm not sure what the state of julia internals for precompilation are here, but that's what I can see from our end.
Running the register_llvm_rules()
in a module where optimizations are turned off more or less fixes the bad behavior:
However, I have no idea what impact this has on runtime performance so that would have to be investigated. Might not be a good idea at all.
Yeah disabling optimizations seems like a bad idea.
Is there any other way we can cut down on load time here?
On Mon, Nov 20, 2023 at 3:29 AM Kristoffer Carlsson < @.***> wrote:
Running the register_llvm_rules() in a module where optimizations are turned off more or less fixes the bad behavior:
@.***
diff-21145f7299f4174cfb41f7c0c845a0ee798fc065bae0a352171552dc4b810508R1033
However, I have no idea what impact this has on runtime performance so that would have to be investigated. Might not be a good idea at all.
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme.jl/issues/776#issuecomment-1818776971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXDWSHKOS7FYUK3RHJDYFMWGZAVCNFSM6AAAAAAXJ25G6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJYG43TMOJXGE . You are receiving this because you modified the open/close state.Message ID: @.***>
Is there any other way we can cut down on load time here?
I guess these are the possibilities:
__init__
to precompilation.Yeah disabling optimizations seems like a bad idea.
I think disabling optimizations for the registration functions should be fine. But I wonder if it impacts the callbacks, if it is only the trampoline that should be fine.
But I wonder if it impacts the callbacks, if it is only the trampoline that should be fine.
That's what I was worried about as well. I think the compilation time was in fact partly due to inferring the callbacks so my change making things faster probably means the callbacks also stopped getting inferred...
Concretly we are spending ~30s on compiling init
When loading OmniPackage.jl(https://github.com/JuliaComputing/OmniPackage.jl/) a big part of its load time (around 10s) is spent compiling and running the
__init__
method in Enzyme, profile available at https://kristofferc.github.io/tracy-traces/omnipackage/.Ideally, Enzyme would somehow protect itself against this, by either making the code in
__init__
less vulnerable to invalidations or some other way (maybe using a frozen world age).cc @vchuravy