Closed b-fg closed 2 months ago
I can't reproduce the ForwardDiff bug, are you perhaps using an older version of DI? Proper tag handling wasn't always present but it should work in the latest version. The two versions below have the exact same output for me, while keeping the lines you mention commented out. Incidentally, you want derivative
and not gradient
here, because your input is scalar.
# ForwardDiff
dsim(θ), ForwardDiff.derivative(dsim, θ)
# ForwardDiff via DI
value_and_derivative(dsim, AutoForwardDiff(), θ)
Currently using v0.5.7
(WaterLily.jl-Examples) pkg> st DifferentiationInterface
Status `~/WaterLily.jl-Examples/Project.toml`
[a0c0ee7d] DifferentiationInterface v0.5.7
I think that's the latest, right? So not sure what's going on. Also, thanks for that remark. Could you please clarify the practical difference in this case?
As for the Enzyme issue, I don't think it is related to DI: I get the error too when I call Enzyme directly. Presumably something in the WaterLily Simulation
object makes Enzyme mad.
The AutoEnzyme()
backend object picks the best mode based on the operator you're applying. In the case of derivative
, it will be forward mode, while in the case of gradient
, it will be reverse mode. You can force a given mode with a keyword argument:
help?> AutoEnzyme
search: AutoEnzyme
AutoEnzyme{M}
Struct used to select the Enzyme.jl (https://github.com/EnzymeAD/Enzyme.jl) backend for automatic differentiation.
Defined by ADTypes.jl (https://github.com/SciML/ADTypes.jl).
Constructors
≡≡≡≡≡≡≡≡≡≡≡≡
AutoEnzyme(; mode=nothing)
Fields
≡≡≡≡≡≡
• mode::M: can be either
• an object subtyping EnzymeCore.Mode (like EnzymeCore.Forward or EnzymeCore.Reverse) if a specific mode is required
• nothing to choose the best mode automatically
Thanks for the info! And yes, that's the error I get with enzyme. It points to a memcpy
when creating the Simulation
struct
and says that "Enzyme cannot deduce type", but I cannot see information of where this is exactly happening.
I think that's the latest, right? So not sure what's going on.
Can you run the full code below in a fresh Julia REPL (the initial environment doesn't matter) and see if it still errors?
Also, thanks for that remark. Could you please clarify the practical difference in this case?
As for the operators:
derivative
is for functions f: R -> anything
(shortcut for pushforward
with dx=1
)gradient
is for functions f: anything -> R
(shortcut for pullback
with dy=1
)For scalar input and output, both should work, but I didn't make sure that gradient
does since this case is covered by derivative
already and it will be more efficient
Thanks for the info! And yes, that's the error I get with enzyme. It points to a memcpy when creating the Simulation struct and says that "Enzyme cannot deduce type", but I cannot see information of where this is exactly happening.
I would open an issue on the Enzyme.jl repo with my MWE (not using DifferentiationInterface at all) to see what @wsmoses thinks.
The code you shared does not error indeed. But note that it contains ForwardDiff.derivative(dsim, θ)
, which I assume it forces compilation of dsim
with the appropriate ForwardDiff.Dual
type. If this line is removed, this code produces an error for me
And noted about Enzyme, I will open an issue there. Thanks!
Okay now I got the error. It is very weird because the first call to ForwardDiff.derivative
should be completely independent of the second one to DI.derivative
. Investigating
Is there a version of the code without the KernelAbstractions? I'm not familiar with that library and its macros may mess with my understanding of what goes on inside WaterLily.measure
Oh, note that we use ForwardDiff
within the code:
So I think that the tags are getting mixed up there. That's why we sometimes create the Dual
type with the appropriate tag, eg
T = typeof(ForwardDiff.Tag(dsim, Float64))
θ = ForwardDiff.Dual{T}(θ, one(Float64))
Actually, it seems KernelAbstractions may also mess with the tagging of the arguments, see for example:
To get KA out of the loop, just run with julia -t 1
and WaterLily defaults to non-KA kernels.
Okay, we now have a simpler stacktrace, but the error is still here. Weirdly enough, in this order both calls succeed
ForwardDiff.derivative(dsim, θ)
derivative(dsim, AutoForwardDiff(), θ)
but in this order both calls error
derivative(dsim, AutoForwardDiff(), θ)
ForwardDiff.derivative(dsim, θ)
That's beyond superweird. Is there any global state in WaterLily that might be altered?
Yes, that is weird indeed :/ If you could please clarify what do you mean by global state?
Maybe a global variable of some form that the first call could modify. I think you're right and it has to do with compilation though.
I have removed KA, removed the @loop
macro in measure!
, still getting the same bug.
I think I know what is happening. If I'm right, the global state I'm looking for is ForwardDiff.TAGCOUNT
. I was creating tags in a way that doesn't increment it properly. Let my try a quick fix, by creating tags with Tag(f, eltype(x))
(which increments the counter) instead of Tag{typeof(f), eltype(x)}
(which doesn't).
There is no such thing AFAIK Sure, thanks!. The only thing is that we use ForwardDiff within measure
, and I have been doing some checks and that is indeed the problem. When I don't use the functions in there, ie ForwardDiff.jacobian, ForwardDiff.derivative, ForwardDiff.gradient
the error does not show up.
Can you try it with the branch from https://github.com/gdalle/DifferentiationInterface.jl/pull/357?
Works as intended! Thank you very much for the quick fix. And I will open an issue in Enzyme.jl regarding the other stuff. Cheers :)
Hey! I have been playing with this package to implement AD for WaterLily.jl. The ForwardDiff backend works as expected, and it provides the same result as using ForwardDiff without DI, only if compilation is performed through the appropriate
Dual
type. For example, the following MWE worksbut commenting out the ForwardDiff lines
makes the DI call to error because of the ordering of Dual tags.
Furthermore, switching to Enzyme with
throws an error (with a rather large stacktrace...). I am not sure how to specify forward or reverse mode when using the
AutoEnzyme()
backend, so it could be that by default tries to do backward mode and we have some incompatibility with that. Any help would be appreciated, thanks! :)