EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
439 stars 62 forks source link

Duplicating closures #1669

Closed gdalle closed 1 month ago

gdalle commented 2 months ago

Hi there! I'm working on https://github.com/gdalle/DifferentiationInterface.jl/pull/375 and struggling to figure out what's wrong with the example below:

julia> using Enzyme: Enzyme

julia> data = [0.0]
1-element Vector{Float64}:
 0.0

julia> function f(x)
           copyto!(data, x)
           return sum(data)
       end
f (generic function with 1 method)

julia> Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active, Enzyme.Duplicated([1.0], [2.0]))
ERROR: Enzyme execution failed.
Mismatched activity for:   ret {} addrspace(10)* %0, !dbg !34 const val: {} addrspace(10)* %0
 value=Unknown object of type Vector{Float64}
 llvalue={} addrspace(10)* %0
You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now

Stacktrace:
 [1] copyto!
   @ ./array.jl:368
 [2] copyto!
   @ ./array.jl:388

Stacktrace:
  [1] throwerr(cstr::Cstring)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:1688
  [2] copyto!
    @ ./array.jl:368 [inlined]
  [3] copyto!
    @ ./array.jl:388 [inlined]
  [4] augmented_julia_copyto__6275wrap
    @ ./array.jl:0
  [5] macro expansion
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6658 [inlined]
  [6] enzyme_call
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6258 [inlined]
  [7] AugmentedForwardThunk
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6146 [inlined]
  [8] runtime_generic_augfwd(activity::Type{…}, width::Val{…}, ModifiedBetween::Val{…}, RT::Val{…}, f::typeof(copyto!), df::Nothing, primal_1::Vector{…}, shadow_1_1::Nothing, primal_2::Vector{…}, shadow_2_1::Vector{…})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/rules/jitrules.jl:313
  [9] f
    @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:103 [inlined]
 [10] augmented_julia_f_6238wrap
    @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:0
 [11] macro expansion
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6658 [inlined]
 [12] enzyme_call
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6258 [inlined]
 [13] AugmentedForwardThunk
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6146 [inlined]
 [14] autodiff
    @ ~/.julia/packages/Enzyme/YDcYf/src/Enzyme.jl:258 [inlined]
 [15] autodiff(mode::EnzymeCore.ReverseMode{…}, f::typeof(f), ::Type{…}, args::EnzymeCore.Duplicated{…})
    @ Enzyme ~/.julia/packages/Enzyme/YDcYf/src/Enzyme.jl:326
 [16] top-level scope
    @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:107
Some type information was truncated. Use `show(err)` to see complete types.

julia> Enzyme.autodiff(
           Enzyme.Reverse,
           Enzyme.Duplicated(f, Enzyme.make_zero(f)),
           Enzyme.Active,
           Enzyme.Duplicated([1.0], [2.0]),
       )
ERROR: Type of ghost or constant type EnzymeCore.Duplicated{typeof(f)} is marked as differentiable.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{…}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:3653
  [3] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:5902
  [4] codegen
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:5208 [inlined]
  [5] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6710
  [6] _thunk
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6710 [inlined]
  [7] cached_compilation
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6748 [inlined]
  [8] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…})
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6821
  [9] #s2021#28379
    @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6873 [inlined]
 [10] 
    @ Enzyme.Compiler ./none:0
 [11] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [12] autodiff(::EnzymeCore.ReverseMode{…}, f::EnzymeCore.Duplicated{…}, ::Type{…}, args::EnzymeCore.Duplicated{…})
    @ Enzyme ~/.julia/packages/Enzyme/YDcYf/src/Enzyme.jl:257
 [13] top-level scope
    @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:109
Some type information was truncated. Use `show(err)` to see complete types.

The first failure is expected, but in the second I don't understand the error message

wsmoses commented 2 months ago

You tried to mark a starless variable (ie f) as duplicated.

F has no data so this is both invalid and likely a bug from what code is calling it

On Tue, Jul 23, 2024 at 4:09 AM Guillaume Dalle @.***> wrote:

Hi there! I'm working on gdalle/DifferentiationInterface.jl#375 https://github.com/gdalle/DifferentiationInterface.jl/pull/375 and struggling to figure out what's wrong with the example below:

julia> using Enzyme: Enzyme

julia> data = [0.0]1-element Vector{Float64}: 0.0

julia> function f(x) copyto!(data, x) return sum(data) end f (generic function with 1 method)

julia> Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active, Enzyme.Duplicated([1.0], [2.0])) ERROR: Enzyme execution failed. Mismatched activity for: ret {} addrspace(10) %0, !dbg !34 const val: {} addrspace(10) %0 value=Unknown object of type Vector{Float64} llvalue={} addrspace(10)* %0 You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now

Stacktrace: [1] copyto! @ ./array.jl:368 [2] copyto! @ ./array.jl:388

Stacktrace: [1] throwerr(cstr::Cstring) @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:1688 [2] copyto! @ ./array.jl:368 [inlined] [3] copyto! @ ./array.jl:388 [inlined] [4] augmented_julia_copyto__6275wrap @ ./array.jl:0 [5] macro expansion @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6658 [inlined] [6] enzyme_call @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6258 [inlined] [7] AugmentedForwardThunk @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6146 [inlined] [8] runtime_generic_augfwd(activity::Type{…}, width::Val{…}, ModifiedBetween::Val{…}, RT::Val{…}, f::typeof(copyto!), df::Nothing, primal_1::Vector{…}, shadow_1_1::Nothing, primal_2::Vector{…}, shadow_2_1::Vector{…}) @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/rules/jitrules.jl:313 [9] f @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:103 [inlined] [10] augmented_julia_f_6238wrap @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:0 [11] macro expansion @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6658 [inlined] [12] enzyme_call @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6258 [inlined] [13] AugmentedForwardThunk @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6146 [inlined] [14] autodiff @ ~/.julia/packages/Enzyme/YDcYf/src/Enzyme.jl:258 [inlined] [15] autodiff(mode::EnzymeCore.ReverseMode{…}, f::typeof(f), ::Type{…}, args::EnzymeCore.Duplicated{…}) @ Enzyme ~/.julia/packages/Enzyme/YDcYf/src/Enzyme.jl:326 [16] top-level scope @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:107 Some type information was truncated. Use show(err) to see complete types.

julia> Enzyme.autodiff( Enzyme.Reverse, Enzyme.Duplicated(f, Enzyme.make_zero(f)), Enzyme.Active, Enzyme.Duplicated([1.0], [2.0]), ) ERROR: Type of ghost or constant type EnzymeCore.Duplicated{typeof(f)} is marked as differentiable. Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{…}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…}) @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:3653 [3] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing) @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:5902 [4] codegen @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:5208 [inlined] [5] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool) @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6710 [6] _thunk @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6710 [inlined] [7] cached_compilation @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6748 [inlined] [8] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…}) @ Enzyme.Compiler ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6821 [9] #s2021#28379 @ ~/.julia/packages/Enzyme/YDcYf/src/compiler.jl:6873 [inlined] [10] @ Enzyme.Compiler ./none:0 [11] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any}) @ Core ./boot.jl:602 [12] autodiff(::EnzymeCore.ReverseMode{…}, f::EnzymeCore.Duplicated{…}, ::Type{…}, args::EnzymeCore.Duplicated{…}) @ Enzyme ~/.julia/packages/Enzyme/YDcYf/src/Enzyme.jl:257 [13] top-level scope @ ~/Work/GitHub/Julia/DifferentiationInterface.jl/DifferentiationInterface/test/Back/Enzyme/test.jl:109 Some type information was truncated. Use show(err) to see complete types.

The first failure is expected, but in the second I don't understand the error message

— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme.jl/issues/1669, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXFBXJ2SODAZXBW2LCLZNYFSVAVCNFSM6AAAAABLJ4C4EGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQZDINJVGAYTKMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gdalle commented 2 months ago

I thought f was a closure over data? Also what does starless mean? In my mind, this was the same case we were discussing with Chris of closures over external objects. The implementation I had in mind for AutoEnzyme(constant_function=false) would always try to build Duplicated(f, make_zero(f)), but how can I predict whether it will fail?

gdalle commented 2 months ago

For contrast, the second autodiff call works when applied to this function g:

julia> function make_f(data)
           return x -> sum(copyto!(data, x))
       end
make_f (generic function with 1 method)

julia> g = make_f(data);

Conceptually, I thought it was the same as f?

wsmoses commented 2 months ago

Sorry, meant to say stateless (was typo).

Julia compiles functions with globals differently from closures. your g closure you can do g.data. In contrast your original f is not a closure and load and stores to a global

gdalle commented 2 months ago

Since the default setting is gonna be AutoEnzyme(constant_function=false), is there a way that I can do Duplicated(f, df) systematically and get away with it, even when f is constant with no data like f=exp?

wsmoses commented 2 months ago

The default really should be constant=true imo. And yeah unfortunately not atm

wsmoses commented 2 months ago

Reason being that if someone does say active(nothing), throwing an error is a good way to help users who may have made a set up mistake

gdalle commented 2 months ago

@ChrisRackauckas we issued a release of ADTypes where the default is constant_function=false, because I thought that it would be the most correct thing in every case. Turns out, if f is not a closure of some kind, I cannot do Duplicated(f, df), so constant_function=false will error. In light of this and the discussion above, should we switch the ADTypes default even though it is technically breaking?

wsmoses commented 2 months ago

I'm all in favor of having the option, but the default should be constant. Attempting to differentiate the closure itself is the default of all tools [including Enzyme], and may result in errors which may not happen when it is constant

wsmoses commented 2 months ago

@ChrisRackauckas if that was released, can we yank the broken ADTypes then and fix this back to the default. Otherwise this is a breaking change since signiicant amounts of code will break with non-constant f but succeed with constant f

and in the future can I be added as a reviewer and/or cc'd for Enzyme related ADType changes

ChrisRackauckas commented 2 months ago

No, that's not the right default. You cannot assume that someone using DifferentialEquations.jl has read every line of the Enzyme documentation, or has even seen it the docs of Enzyme.jl at all. You need to assume that every valid function either tries as hard as possible to get the correct answer, or errors. Anything else is a performance option that should be opt-in. Assuming the function is constant does not meet that criteria unless Enzyme has had recent changes that force it to error when enclosed values are hit, which I don't think it has.

gdalle commented 2 months ago

In addition to the philosophical debate we've had together several times, this issue is about a practical problem. I don't see a way to implement constant_function=false in DI that also works when f is actually constant. And according to Billy, there may not be one.

wsmoses commented 2 months ago

You're allowed to make that the default within Diffeq.jl if non-specified [e.g. left as nothing], but it shouldn't be set to the default for everyone. Like shown here and elsewhere in even more cryptic error messages, setting constant=false can cause diferentiation to fail when it was previously working (and is regardless a very breaking change).

Perhaps have this be an optional bool, and if its set to nothing the package decides? But please don't have it as a default=false

ChrisRackauckas commented 2 months ago

If that's the case then we just cannot default to Enzyme since it would need user input, so it should be set to non-optional and all downstream packages should be documented to refrain from defaulting to Enzyme as otherwise it cannot guarantee the right output.

gdalle commented 2 months ago

Again, none of that matters if I can't get constant_function=false to work in general, even in the absence of a closure. That means being more clever than what I'm doing at the moment, namely

https://github.com/gdalle/DifferentiationInterface.jl/blob/09b1f24daf3b9b7c40b9e03f0d98d1312317def3/DifferentiationInterface/ext/DifferentiationInterfaceEnzymeExt/DifferentiationInterfaceEnzymeExt.jl#L77-L84

wsmoses commented 2 months ago

For posterity, here is a sample case of where const works but duplicated breaks:

struct myclosure
    data::Vector{Float64}
end

function (c::myclosure)(x)
    d = weird_ccall(c.data)
    return x * d
end

diferentiating the closure function, if constant, just requires differentiating a mul which is fine. Differentiating a duplicated closure requires differentiating the weird ccall -- which may not be possible and throw an error.

I'm all for having it as an option to consider const/non-const closures [which is why we support it], but changing the behavior is a active choice which will cause code to no longer work.

ChrisRackauckas commented 2 months ago

Our actively used codes already made the interpretation of constant_function=false and so the other direction would be breaking (and indeed the other direction would break tests)

wsmoses commented 2 months ago

@ChrisRackauckas to be clear this isn't a correctness error, but a "throws non-differentiable code" error, so is fine in your use case to always set const=false since if it throws you use a fallback anyways. But some folk who assumed the other way presently are broken.

And yes I concur that in your case you made the other assumption.

Hence -- can we make it an optional{bool} and have "nothing" be the default. And thus backends can choose the appropriate default

gdalle commented 2 months ago

I'll leave you two to fight it out 😅 but again, regardless of the default, I need help to make the non constant version work in DI for generic functions

ChrisRackauckas commented 2 months ago

There has been an assumption of const=false on implementations since it was noticed as required to compute the correct values in https://github.com/SciML/SciMLSensitivity.jl/pull/962

wsmoses commented 2 months ago

Yeah but that's an assumption in one package. Other packages have made opposing assumptions so it's breaking for them.

I'm not disagreeing that you can continue to make the assumption which was best for your use case, but others have reasonable assumptions for them which were different.

=> Let's not force any assumption to break downstream, and make it optional

ChrisRackauckas commented 2 months ago

Yeah but that's an assumption in one package. Other packages have made opposing assumptions so it's breaking for them.

That was the assumption of SciMLSensitivity and in theory Optimization, though it did have some bugs. Now it's the documented assumption.

wsmoses commented 2 months ago

And it was the opposite assumption/default made by Enzyme itself, among other user packages.

ChrisRackauckas commented 2 months ago

I'm not disagreeing that you can continue to make the assumption which was best for your use case, but others have reasonable assumptions for them which were different.

We want a single assumption all throughout any of our use cases since if you can take the same function and get a different value in Optimization than if you put it into a differential equation, we'd be living in a world where the rabbits are chasing the wolves.

Now the thing that doesn't seem to be mentioned is:

You may be using a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Activity-of-temporary-storage). If not, please open an issue, and either rewrite this variable to not be conditionally active or use Enzyme.API.runtimeActivity!(true) as a workaround for now

seems to be a new error. Do we now get a guarantee that any value written to a closure will throw this error? Because if there's now an error then it can be okay. We'd want to add an error hint so that people know how to handle this, but that's easy to do from SciMLBase.

wsmoses commented 2 months ago

The error message has been around for a bit more than a year: https://github.com/EnzymeAD/Enzyme.jl/pull/812

It will arise if trying to store a (mixed)duplicated piece of data into constant storage/return/etc as Enzyme cannot ensure correctness in these cases without runtime activity.

A caveat is that it will not trigger when storing active (aka double) data into a constant storage, as Enzyme can ensure correctness assuming the input argument activities are correct [e.g. any variable marked constant is constant].

gdalle commented 1 month ago

So after a long discussion on Slack, @wsmoses proposed that Enzyme should implement an error analysis pass.

At which point enzyme’s behavior will be: closure marked constant, make the standard assumptions it is non differentiable; marked duplicated, differentiate the closure; unmarked, do the same as const but run a conservative analysis that will throw an error if there would have needed to be a shadow for the closure