EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
455 stars 63 forks source link

Gradient error on simple forward/reverse comparison #701

Closed MilesCranmer closed 1 month ago

MilesCranmer commented 1 year ago

@vchuravy I am trying to create an example for the docs as you suggested on my tweet: https://twitter.com/MilesCranmer/status/1643782144941080576?s=20

I was wondering if you could help me with this? Here is what I have so far:

import ForwardDiff, ReverseDiff, Enzyme

function f(x, params)
    for i = 1:(length(params))
        x = @. cos(x + params[i])
    end
    return x
end

function forward(x, params)
    return ForwardDiff.gradient(p -> sum(f(x, p)), params)
end

function enzyme_forward(x, params)
    return Enzyme.gradient(Enzyme.Forward, p -> sum(f(x, p)), params)
end

However, this immediately gives the incorrect gradient:

julia> enzyme_forward([1.0], [1.0, 0.5])
(0.15231628671881603, -0.007596787769422761)

versus

julia> forward([1.0], [1.0, 0.5])
(0.07615814335940801, -0.08375493112883077)

There are no warning or error messages either, it's just incorrect. Any idea why? Does it not like closures?

If I unroll f manually and insert x it works:

julia> f(params) = cos(cos(1.0 + params[1]) + params[2])
f (generic function with 2 methods)

julia> Enzyme.gradient(Enzyme.Forward, f, [1.0, 0.5])
(0.07615814335940801, -0.08375493112883077)

Cheers, Miles

wsmoses commented 1 year ago

Since you're modifying x in place, you also need to pass a shadow x, which the simpler gradient wrapper doesn't support.

Since the gradient wrapper you're using uses vector mode, I've also done so below:

values(only(autodiff(Forward, (x,p) -> sum(f(x, p)), BatchDuplicatedNoNeed, BatchDuplicated(x, ((zero(x) for _ in 1:length(params))...,)), BatchDuplicated(params, onehot(params)))))

You could also do the same by using a Duplicated closure.

MilesCranmer commented 1 year ago

Thanks!

Since you're modifying x in place, you also need to pass a shadow x, which the simpler gradient wrapper doesn't support.

Not sure I understand this point. I understand that this:

@. x = ...

is modifying x in-place, but isn't this:

x = @. ...

not modifying x in-place? (i.e., the @. should only apply to what comes after). The function I pasted above uses the second one.

MilesCranmer commented 1 year ago

By the way, sorry to stay on the same issue for a follow-up question, but how would I modify your code to work for Reverse? It doesn't seem to work as-is:

+ return values(only(autodiff(Reverse, (x, p) -> sum(f(x, p)), BatchDuplicatedNoNeed, BatchDuplicated(x, ((zero(x) for _ in 1:length(params))...,)), BatchDuplicated(params, onehot(params)))))
- return values(only(autodiff(Forward, (x, p) -> sum(f(x, p)), BatchDuplicatedNoNeed, BatchDuplicated(x, ((zero(x) for _ in 1:length(params))...,)), BatchDuplicated(params, onehot(params)))))

gives

ERROR: Duplicated Returns not yet handled
Stacktrace:
 [1] autodiff(::EnzymeCore.ReverseMode, ::var"#11#13", ::Type{BatchDuplicatedNoNeed}, ::BatchDuplicated{Vector{Float64}, 1}, ::Vararg{BatchDuplicated{Vector{Float64}, 1}})
   @ Enzyme ~/.julia/packages/Enzyme/DIkTv/src/Enzyme.jl:209
 [2] reverse(x::Vector{Float64}, params::Vector{Float64})
   @ Main ~/forward_vs_reverse/benchmark.jl:37
 [3] top-level scope
   @ ~/forward_vs_reverse/benchmark.jl:42
wsmoses commented 1 year ago
dparams = zero(params)
autodiff(Reverse, (x, p) -> sum(f(x, p)), Active, Duplicated(x, zero(x)), Duplicated(params, dparams))
return dparams

Here since reverse mode gives you the entire derivative in one sweep, you don't need batching. Moreover the other comment that you saw regarding duplicated returns -- note that the return type of this call is changed to active.

wsmoses commented 1 year ago

That was a misreading from my part. If it's not reading it in place, you also can get away with the following above.

I'll investigate the bug now.

values(only(autodiff(Forward, (x,p) -> sum(f(x, p)), BatchDuplicatedNoNeed, Const(x), BatchDuplicated(params, onehot(params)))))
dparams = zero(params)
autodiff(Reverse, (x, p) -> sum(f(x, p)), Active, Const(x), Duplicated(params, dparams))
return dparams
wsmoses commented 1 year ago

minimized error. in activity analysis

using Enzyme, ForwardDiff
function f(p, len)
    i = 1
    x = Vector{Float64}(undef, 1)
    while true
        tmp = Vector{Float64}(undef, 1)
        @inbounds tmp[1] = @inbounds x[1] + p
        x = tmp
        if i == len
            break
        end
        i += 1
    end
    return @inbounds x[1]
end

function enzyme_forward(x, params)
    return Enzyme.autodiff(Enzyme.Forward, f, DuplicatedNoNeed, Duplicated(1.0, 1.0), Const(2))
end

# Enzyme.API.printall!(true)
# Enzyme.API.printactivity!(true)
@show enzyme_forward([1.0], [1.0, 0.5])
wsmoses commented 1 year ago

https://fwd.gymni.ch/8KuoRv

wsmoses commented 1 year ago

https://github.com/EnzymeAD/Enzyme/pull/1095

wsmoses commented 1 year ago

other minimization:

using Enzyme

function f(x, params, len)
     i = 1
     # x = Vector{Float64}(undef, 1)
     while true
         tmp = Vector{Float64}(undef, 1)
         @inbounds tmp[1] = @inbounds x[1] + @inbounds params[i]
         x = tmp
         if i == len
             break
         end
         i += 1
     end
     return @inbounds x[1]
end

function enzyme_forward(x, params)
    return Enzyme.autodiff(Enzyme.Forward, f, DuplicatedNoNeed, Const(x), Duplicated(params, [1.0, 0.0]), Const(2))
end

Enzyme.API.printall!(true)
@show enzyme_forward([1.0], [1.0, 0.5])

# @show forward([1.0], [1.0, 0.5])
tmigot commented 1 year ago

Hi @wsmoses , I am not sure if it is related, but I also had a Duplicated Returns not yet handled on this example ( https://github.com/JuliaSmoothOptimizers/OptimizationProblems.jl/blob/main/src/ADNLPProblems/elec.jl ) that might be simpler:

function f(x; n = n)
           return sum(
             sum(
               1 / sqrt((x[j] - x[i])^2 + (x[n + j] - x[n + i])^2 + (x[2n + j] - x[2n + i])^2) for
               j = (i + 1):n
             ) for i = 1:(n - 1)
           )
 end
x = zeros(600)
g = similar(x)
gradient!(Reverse, g, f, x)
wsmoses commented 1 year ago

@tmigot that's an unrelated/separate thing where your function is type instable in return. Add a Float64 as the forced return type and retry?

e.g.

function f(x; n = n)::Float64
           return sum(
             sum(
               1 / sqrt((x[j] - x[i])^2 + (x[n + j] - x[n + i])^2 + (x[2n + j] - x[2n + i])^2) for
               j = (i + 1):n
             ) for i = 1:(n - 1)
           )
 end
x = zeros(600)
g = similar(x)
gradient!(Reverse, g, f, x)

Also I'm going to highly recommend usage of the autodiff interface.

x = ... whatver your input array is, say rand(600)
dx = zeros(600) # this should be zero init'd
Enzyme.autodiff(Reverse, f, x, dx)
tmigot commented 1 year ago

Oh, I see. Thanks!

MilesCranmer commented 1 year ago

Sorry for the late reply. I feel a bit more comfortable with the syntax so I'm hoping to get to this soon.

By the way, congrats on the defense @wsmoses!

wsmoses commented 1 month ago

As this now successfully throws a runtime activity error, am going to close (and it succeeds with runtime activity)