Open hexaeder opened 1 week ago
@hexaeder is this the code which segfaults? in reverse mode (the error message implies you should use runtime activity so that seems like the resolution). However, a segfault is clearly bad so I want to make sure we fix that.
Hi! The segfault appears in my actual code, but I wasn't able to reproduce it to a MWE. If your interested I can try to set up a script which sets up the full objects using my packages and segfaults on jacobian call, but I'm not sure how easy debugging in there would be.
I was mainly posting because the error message said something like report if you think enzyme should be able to prove no runtime activity. And I don't see why the MWEs would contain runtime activity...
yeah that would be helpful [it should never segfault].
It is though confusing why it would require runtime activity here indeed
Unfortunately, I was not able to reproduce the problem with a plain Enzyme
call, only in a jacobian call of DifferentiationInterface.
I tried with plain Enzyme.autodiff
but might be related to the batch-mode used by DI which I am not able to invoke myself.
Additional observations:
NNLib
. Probably because of newer package versions?using Pkg
@assert VERSION == v"1.10.5"
pkg"activate --temp"
pkg"add NetworkDynamics#3e99370, Enzyme, Graphs, DifferentiationInterface"
using NetworkDynamics, Graphs, Enzyme
using Enzyme: Enzyme
using DifferentiationInterface: DifferentiationInterface as DI
# we need to load some test utils from NetworkDynamics
include(joinpath(pkgdir(NetworkDynamics),"test","ComponentLibrary.jl"))
# setup of the system
g = complete_graph(4)
vf = Lib.kuramoto_second()
ef = [Lib.diffusion_odeedge(),
Lib.kuramoto_edge(),
Lib.kuramoto_edge(),
Lib.diffusion_edge_fid(),
Lib.diffusion_odeedge(),
Lib.diffusion_edge_fid()]
nw = Network(g, vf, ef)
x0 = rand(dim(nw))
dx = zeros(dim(nw))
p0 = rand(pdim(nw))
# this is the rhs we want to differentiate
# the last argument is time but it is not used in the system so I use NaN.
nw(dx, x0, p0, NaN)
# fault
DI.jacobian(nw, dx, DI.AutoEnzyme(mode=Enzyme.set_runtime_activity(Enzyme.Reverse), function_annotation=Enzyme.Duplicated), x0, DI.Constant(p0), DI.Constant(NaN))
@gdalle re DI segfault
In this case, nw
seems to be an out-of-place function? If so, DI.jacobian
uses split reverse mode with autodiff_thunk
to be able to pass arbitrary adjoints with array outputs. It also wraps BatchDuplicated
around the inputs (and around the function itself because of function_annotation
). So a pure-Enzyme MWE would have to use all of these ingredients.
Can you maybe boil it down to a simpler function passed to DI.jacobian
? That would facilitate our investigation.
Also note that because of this split reverse mode, any information you pass to the mode
object inside AutoEnzyme
is currently lost (because I use a split mode instead of a standard mode). Once https://github.com/EnzymeAD/Enzyme.jl/pull/1979 is merged, I can perform a better conversion and preserve settings like runtime activity.
Ah I didn't realize that the Segfault only happens in the more complex usecase by DI, not by my very simple Enzyme.autodiff
call.
When using DI.jacobian
, both MWEs from the initial post actually crash Julia. Because its shorter, here's the second one:
using Pkg
pkg"activate --temp"
pkg"add Enzyme, DifferentiationInterface"
using Enzyme: Enzyme
using DifferentiationInterface: DifferentiationInterface as DI
struct Functor{RT}
range::RT
end
function (f::Functor)(du, u, p, t)
r = f.range
# r = 1:4 # this literal would work
_du = view(du, r)
_p = view(p, r)
_du .= _p
nothing
end
f = Functor(1:4)
# test normal function call
dx, x, p, t = zeros(4), zeros(4), collect(1.0:4.0), NaN
f(dx, x, p, t)
@assert dx == 1:4
#💣
DI.jacobian(f, dx, DI.AutoEnzyme(mode=Enzyme.set_runtime_activity(Enzyme.Reverse), function_annotation=Enzyme.Duplicated), x, DI.Constant(p), DI.Constant(NaN))
Thanks for the smaller MWE!
Here we are working with an in-place function, so DI can use autodiff
directly and the remarks about split mode from earlier don't apply. Also, your function is a functor but it does not contain differentiable data, so the right annotation here would be function_annotation=Enzyme.Const
and not Enzyme.Duplicated
. Actually we can dispense with the annotation altogether, because Enzyme can prove that this enclosed data is read-only.
So in the end, this is a tale of two runtime activities. This version errors but the error is pretty self-explanatory: you copied data from p
(constant) to the output dx
(differentiable).
backend_errors = DI.AutoEnzyme(; mode=Enzyme.Reverse)
DI.jacobian(f, dx, backend_errors, x, DI.Constant(p), DI.Constant(NaN))
ERROR: Constant memory is stored (or returned) to a differentiable variable.
As a result, Enzyme cannot provably ensure correctness and throws this error.
This might be due to the use of a constant variable as temporary storage for active memory (https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity).
If Enzyme should be able to prove this use non-differentable, open an issue!
To work around this issue, either:
a) rewrite this variable to not be conditionally active (fastest, but requires a code change), or
b) set the Enzyme mode to turn on runtime activity (e.g. autodiff(set_runtime_activity(Reverse), ...) ). This will maintain correctness, but may slightly reduce performance.
Meanwhile, this version segfaults:
backend_segfaults = DI.AutoEnzyme(; mode=Enzyme.set_runtime_activity(Enzyme.Reverse))
DI.jacobian(f, dx, backend_segfaults, x, DI.Constant(p), DI.Constant(NaN))
My best guess is that the problem comes from the runtime activity analysis?
@gdalle can you boil out the DI sugar to something which errs with just Enzyme calls? And paste the stack trace
Sounds good, I'll try that tomorrow, logging out for the day!
I am trying to make the RHS of an ODEProblem Enzyme compatible. My function has the signature
(du, u, p, t)
and I try to differentiatedu
foru
for constantp
andt
. I hit the errorfor some operations which use
p
in a calculation fordu
. I am quite new to Enzyme and don't fully understand this error, but on very simple examples it isn't a problem to useConst(p)
to calculateDuplicated(du)
.I boiled it down to 2 MWEs. The first MWE is closer to my actual code, including loop unrolling. The second MWE seems to error because of the broadcasting but does not need the loop unrolling to fail. I am not sure whether both demonstrate the same or different problems.
Both Examples have been created on Julia 1.10.5 and Enzyme 0.13.8. I am aware of
set_runtime_activity
, which works for forward mode in my actual example but segfaults for reverse mode...MWE 1
MWE 2