ReverseDiff.value(output) depends on input of tape for mutating functions

frankschae commented 4 years ago

When using ReverseDiff to compute the gradients of an inplace function f!, we found that the output value of ReverseDiff depends on the input value of the tape (y*2 in the MWE). For the non-mutating input function f, one obtains in

(ReverseDiff.deriv(tu), ReverseDiff.deriv(tp), ReverseDiff.value(output)) = ([-8.0], [-16.0], [8.0])

However, for the mutating function, one gets

(ReverseDiff.deriv(tu), ReverseDiff.deriv(tp), ReverseDiff.value(output)) = ([-8.0], [-16.0], 16.0])

i.e., vec(du1) != ReverseDiff.value(output)) in the last line of the MWE. This happens for both a compiled and a non-compiled tape.


using ReverseDiff
p2 = [2.0]

f!(du,u,p,t) = du.=p[1]*u

f(u,p,t) = p[1]*u

y = [4.0]
λ = -y
t = 1.0
tape = ReverseDiff.GradientTape((y*2, p2, [t])) do u,p,t 
  du1 = similar(u, size(u))
  f!(du1,u,p,first(t))
  return vec(du1)
end

config = ReverseDiff.compile(tape)

# tape = ReverseDiff.GradientTape((y*2, p2, [t])) do u,p,t
#   vec(f(u,p,first(t)))
# end

tu, tp, tt = ReverseDiff.input_hook(config)
output = ReverseDiff.output_hook(config)
ReverseDiff.unseed!(tu) # clear any "leftover" derivatives from previous calls
ReverseDiff.unseed!(tp)
ReverseDiff.value!(tu, y)
ReverseDiff.value!(tp, p2)
ReverseDiff.value!(tt, [t])
ReverseDiff.forward_pass!(config)
ReverseDiff.increment_deriv!(output, λ)
ReverseDiff.reverse_pass!(config)
@show ReverseDiff.deriv(tu), ReverseDiff.deriv(tp), ReverseDiff.value(output)
### ([-8.0], [-16.0], [16.0])
du1 = similar(y, size(y))
f!(du1,y,p2,first(t))
vec(du1) == vec(f(y,p2,first(t)))
vec(du1) == ReverseDiff.value(output)

ChrisRackauckas commented 4 years ago

The behavior of this was correctly changing the input on v1.2.1 but not on v1.4, so unless @mohamed82008 has an idea what could've caused the change, the best thing to do might be to start with a git bisection to find the commit that changed the result here.

mohamed82008 commented 4 years ago

I will take a look.

ChrisRackauckas commented 4 years ago

@frankschae did you ever bisect this?

frankschae commented 4 years ago

no, I just checked that it doesn't occur at the v1.2.0 tag

frankschae commented 4 years ago

v1.3.0 is still fine, on v1.4.0 the error occurs.

(More precisely, If I checkout

commit: d1b9f22a0ba4e23710b7a28774bb10792dd811de Merge pull request #125 from JuliaDiff/mt/array_funcs fill, vcat, hcat, cat, reshape, any, and all custom adjoints

it looks ok but with the subsequent commit:

commit: a78ae7440c24af44bdc21dea6fd578b146c33677 broadcasting perf fix I obtain the wrong output value)

ChrisRackauckas commented 4 years ago

That commit looks like it only changes broadcast behavior, so I wonder, does it still fail if you change the test to: f!(du,u,p,t) = du[1]=p[1]*u[1] ?

frankschae commented 4 years ago

no, it's fine

f!(du,u,p,t) = du.=p[1]*u -> fails
f!(du,u,p,t) = du.=p[1]*u[1] -> works
f!(du,u,p,t) = du[1]=p[1]*u[1] -> works

ChrisRackauckas commented 4 years ago

boom, that narrows it down.

ChrisRackauckas commented 4 years ago

frankschae 5:57 PM doesn’t show up for oop. Both f(u,p,t) = @. p[1]u and f(u,p,t) = p[1]u work

Looks like it's only in the in-place definition.

mohamed82008 commented 4 years ago

The test above still fails after replacing the functions with:

f!(du,u,p,t) = du .= u
f(u,p,t) = u

and removing the entire broadcast.jl file introduced in the commit linked. This shows that the "test" script above does something funny. In fact, calling pull_value!(output) before running the test makes the test pass using the latest ReverseDiff. I think output_hook is just not for "user" use. TrackedReals in ReverseDiff are a bit funny in that they don't necessarily hold their actual value if the value comes from a parent TrackedArray, e.g. if the TrackedReal is the result of calling getindex on the parent TrackedArray. This is why you need to call pull_value! first before asking for the value of a TrackedReal. This will bring in their value from their parent TrackedArray if one exists.

Given the above analysis, I feel this is not a real issue. If you have a real use case, e.g. computing the gradient or jacobian of a function that fails compared to finite difference or ForwardDiff, please post it here. Otherwise, please close this issue if you find this answer satisfactory.

JuliaDiff / ReverseDiff.jl

ReverseDiff.value(output) depends on input of tape for mutating functions #143