gdalle / DifferentiationInterface.jl

An interface to various automatic differentiation backends in Julia.
https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface
MIT License
132 stars 9 forks source link

Enzyme hessian with closure: "You may be using a constant variable as temporary storage for active memory" #307

Open jbrea opened 3 weeks ago

jbrea commented 3 weeks ago
julia> g(x, y) = sum((x .* y).^2)
g (generic function with 1 method)

julia> f = let y = rand(10)
           x -> g(x, y)
       end
#57 (generic function with 1 method)

julia> hessian(f, AutoEnzyme(), rand(10))
ERROR: Enzyme execution failed.
Mismatched activity for:   store {} addrspace(10)* %.fca.0.0.extract, {} addrspace(10)* addrspace(10)* %memcpy_refined_dst.i, align 8, !dbg !17, !tbaa !21, !alias.scope !25, !noalias !28 const val:   %.fca.0.0.extract = extractvalue { [1 x {} addrspace(10)*] } %0, 0, 0, !dbg !8
Type tree: {[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@double, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}
 llvalue=  %.fca.0.0.extract = extractvalue { [1 x {} addrspace(10)*] } %0, 0, 0, !dbg !8
You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now

Stacktrace:
 [1] gradient
   @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/first_order/gradient.jl:84
 [2] inner_gradient_closure
   @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:99
 [3] inner_gradient_closure
   @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:0

Stacktrace:
  [1] throwerr(cstr::Cstring)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/F71IJ/src/compiler.jl:1338
  [2] gradient
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/first_order/gradient.jl:84 [inlined]
  [3] inner_gradient_closure
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:99 [inlined]
  [4] inner_gradient_closure
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:0 [inlined]
  [5] fwddiffejulia_inner_gradient_closure_9648_inner_1wrap
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:0
  [6] macro expansion
    @ ~/.julia/packages/Enzyme/F71IJ/src/compiler.jl:5916 [inlined]
  [7] enzyme_call
    @ ~/.julia/packages/Enzyme/F71IJ/src/compiler.jl:5566 [inlined]
  [8] ForwardModeThunk
    @ ~/.julia/packages/Enzyme/F71IJ/src/compiler.jl:5446 [inlined]
  [9] autodiff
    @ ~/.julia/packages/Enzyme/F71IJ/src/Enzyme.jl:399 [inlined]
 [10] pushforward
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/ext/DifferentiationInterfaceEnzymeExt/forward_onearg.jl:28 [inlined]
 [11] hvp_aux
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:166 [inlined]
 [12] hvp
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hvp.jl:154 [inlined]
 [13] #31
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hessian.jl:67 [inlined]
 [14] #186
    @ ./none:0 [inlined]
 [15] iterate
    @ ./generator.jl:47 [inlined]
 [16] _typed_stack(::Colon, ::Type{…}, ::Type{…}, A::Base.Generator{…}, Aax::Tuple{…})
    @ Base ./abstractarray.jl:2817
 [17] _typed_stack
    @ ./abstractarray.jl:2817 [inlined]
 [18] _stack
    @ ./abstractarray.jl:2807 [inlined]
 [19] _stack
    @ ./abstractarray.jl:2799 [inlined]
 [20] stack
    @ ./abstractarray.jl:2796 [inlined]
 [21] hessian
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hessian.jl:66 [inlined]
 [22] hessian
    @ ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hessian.jl:57 [inlined]
 [23] hessian(f::var"#57#58"{Vector{Float64}}, backend::AutoEnzyme{Nothing}, x::Vector{Float64})
    @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/IH8L9/src/second_order/hessian.jl:57
 [24] top-level scope
    @ REPL[123]:1
Some type information was truncated. Use `show(err)` to see complete types.
gdalle commented 3 weeks ago

Hi! I think it's a problem with the type-instability of the function definition. A closure creates type-instability, and Enzyme does not like that. For instance, the following works:

julia> h(x) = sum((x .* x) .^ 2)
h (generic function with 1 method)

julia> hessian(h, AutoEnzyme(), rand(10))
10×10 Matrix{Float64}:
 7.93735  0.0      0.0      0.0      …  0.0      0.0     0.0      0.0
 0.0      2.85219  0.0      0.0         0.0      0.0     0.0      0.0
 0.0      0.0      6.47397  0.0         0.0      0.0     0.0      0.0
 0.0      0.0      0.0      4.03136     0.0      0.0     0.0      0.0
 0.0      0.0      0.0      0.0         0.0      0.0     0.0      0.0
 0.0      0.0      0.0      0.0      …  0.0      0.0     0.0      0.0
 0.0      0.0      0.0      0.0         4.34344  0.0     0.0      0.0
 0.0      0.0      0.0      0.0         0.0      9.0601  0.0      0.0
 0.0      0.0      0.0      0.0         0.0      0.0     8.02508  0.0
 0.0      0.0      0.0      0.0         0.0      0.0     0.0      6.5375
gdalle commented 3 weeks ago

I suggest you open the same issue on the Enzyme side

jbrea commented 3 weeks ago

I think it's a problem with the type-instability of the function definition. A closure creates type-instability

Are you sure? I don't see why this should be type unstable, and JET doesn't report any type-instability.

Also, the error message "You may be using a constant variable as temporary storage for active memory" rather suggests that a Const is used instead of a Duplicated (or BatchDuplicated) at some point, no?

gdalle commented 3 weeks ago

Oh you're right, the let probably makes it type-stable.

I think https://github.com/gdalle/DifferentiationInterface.jl/pull/309 might fix it, can you try out the PR branch with your toy problem or your actual problem and see?

gdalle commented 3 weeks ago

Oops it breaks more than it fixes :scream:

gdalle commented 3 weeks ago

@wsmoses any clues here?

wsmoses commented 3 weeks ago

I believe the problem here is the closure.

What if you use enzyme autodiff rather than DI (and actually pass multiple arguments, y marked const)

wsmoses commented 3 weeks ago

Given that the backtrace comes from DI.inner_gradient_closure, my hypothesis is indeed that the closure contains both active memory and constant memory -- and that the store of constant memory into the active closure here is the problem described in the linked FAQ.

-> recommending you just call autodiff rather than DI, as DI doesn't support multiple arguments [and thus requires the closure]

jbrea commented 3 weeks ago

What if you use enzyme autodiff rather than DI

Yes, sure, I know how to do it with Enzyme directly, but I hoped I could simplify my code with DifferentiationInterface.

wsmoses commented 3 weeks ago

Yeah this is a known issue for users of DI, where it presently doesn't support effective usage of Enzyme -- specifically multi arg / multi activity.

I think it would be prudent for DI to add that, but until then DI will add unnecessary limitations on what Enzyme can presently correctly and efficiently autodiff.

I wrote some comments about this https://github.com/tpapp/LogDensityProblemsAD.jl/pull/29#pullrequestreview-2051253729 / https://github.com/tpapp/LogDensityProblemsAD.jl/pull/29#discussion_r1597685998

Unfortunately when I commented "Closures were one of the reasons imo that AD.jl didn't take off, so long term I think DI.jl would be well served by both not introducing them itself (my understanding is that it does this successfully), but also not forcing users to create the same closures user-side (which would create similar issues)." but the response was "DI is unlikely to support either multiple arguments or activity specification anytime soon"

@gdalle given the number of users who need this, do you think it is possible to higher prioritize?

wsmoses commented 3 weeks ago

@jbrea if you'd like I can also add some nice syntax sugar like Enzyme.hessian (with multiple arguments). Would that be useful to you?

jbrea commented 3 weeks ago

I can also add some nice syntax sugar like Enzyme.hessian

Oh, that's kind, thanks. If it is a super quick thing to do, why not. But it doesn't bother me much. ForwardDiff is usually fast enough for the Hessians I need. Failed type analysis with Enzyme bothers me more (e.g. https://github.com/EnzymeAD/Enzyme.jl/issues/1410, but I also ran into a failed type analysis error for a gradient calculation, which I couldn't condense to a MWE yet.)

gdalle commented 3 weeks ago

Would that be useful to you?

It would certainly be useful to me! Right now DI is the only simple way to compute many operators with Enzyme (most notably the hessian), it would be awesome to see some of those operators taken care of. Feel free to ping me on the PR for review if you end up doing it!

wsmoses commented 3 weeks ago

I'm not sure only way is accurate. We describe how to make hessians in our basics guide: https://enzyme.mit.edu/index.fcgi/julia/stable/generated/autodiff/

gdalle commented 3 weeks ago

Which is why I said only simple way ;)

wsmoses commented 3 weeks ago

Ah missed that, my b

gdalle commented 3 weeks ago

No worries! Enzyme is amazing, its only flaw is a rather steep entrance cost, but that's why I'm working on lowering it :)

I opened an issue on the Enzyme side to discuss:

wsmoses commented 3 weeks ago

Also I stand corrected, I forgot https://github.com/SciML/OptimizationBase.jl/blob/b2251ec6831c3631fe6db74e8f1b7bdefeb9c0b8/ext/OptimizationEnzymeExt.jl#L38 served as an easy way to use Enzyme for hessians as well.

gdalle commented 3 weeks ago

The issue with multiple arguments or activities is not adding it, it's testing it properly in all possible cases. The reason DI is so reliable is because we have thousands of LOCs to test every operator in every situation. Adding activities multiplies the number of possible cases by at least 2, so I shudder to think of the effort necessary

gdalle commented 3 weeks ago

I have opened #311 to discuss this specifically